
¶
Kubernetes Labs¶
This is a comprehensive collection of hands-on labs designed to help you learn and master Kubernetes concepts, from basic deployments to advanced topics like Istio, ArgoCD and custom schedulers.
π What You’ll Learn¶
- This lab series covers a wide range of
Kubernetestopics:
-
Basics¶
Namespaces, Deployments, Services and Rollouts
-
Storage¶
DataStores, Persistent Volume Claims and StatefulSets
-
Networking¶
Ingress Controllers and Service Mesh (Istio)
-
Configuration Management¶
Kustomization and Helm Charts
-
GitOps¶
ArgoCD for continuous deployment
-
Observability¶
Istio, Kiali, Logging, Prometheus and Grafana
-
Advanced Topics¶
Custom Resource Definitions (CRDs), Custom Schedulers and Pod Disruption Budgets
-
Tools¶
k9s, Krew, Kubeapps, Kubeadm and Rancher
π οΈ Prerequisites¶
-
Before starting these labs, you should have:
-
Basic understanding of containerization (Docker)
- Command-line (CLI) familiarity
- A Kubernetes cluster (Minikube, Kind, or cloud-based cluster)
kubectlinstalled and configured
- Recommended Software Installations:
| Tool Name | Description |
|---|---|
| DevBox | Development environment manager |
| Docker | Containerization tool |
| Git | Version control system |
| Helm | Kubernetes package manager |
| Kubernetes | Container orchestration platform |
| Node.js | JavaScript runtime environment |
| Visual Studio Code | Source code editor |
| k9s | Kubernetes CLI tool |
| Kind | Kubernetes cluster |
| kubectl | Kubernetes command-line tool |
DevBox Installation¶
π³ Docker Installation¶
# Set up the repository
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# Install Docker
sudo yum install -y docker-ce docker-ce-cli containerd.io
# Start Docker
sudo systemctl start docker
# Add user to docker group
sudo usermod -aG docker $USER
# Restart session or run:
newgrp docker
π₯ Git Installation¶
Download Git from the official website: https://git-scm.com/download/win
β Helm Installation¶
βΈοΈ kubectl Installation¶
π’ Node.js Installation¶
π» Visual Studio Code Installation¶
# Install VS Code using snap
sudo snap install code --classic
# Or using apt repository
# wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > packages.microsoft.gpg
# sudo install -o root -g root -m 644 packages.microsoft.gpg /etc/apt/trusted.gpg.d/
# sudo sh -c 'echo "deb [arch=amd64,arm64,armhf signed-by=/etc/apt/trusted.gpg.d/packages.microsoft.gpg] https://packages.microsoft.com/repos/code stable main" > /etc/apt/sources.list.d/vscode.list'
# sudo apt update
# sudo apt install code
# Start VS Code
code .
# Import Microsoft GPG key
sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc
# Add VS Code repository
sudo sh -c 'echo -e "[code]\nname=Visual Studio Code\nbaseurl=https://packages.microsoft.com/yumrepos/vscode\nenabled=1\ngpgcheck=1\ngpgkey=https://packages.microsoft.com/keys/microsoft.asc" > /etc/yum.repos.d/vscode.repo'
# Install VS Code
sudo yum install -y code
# Start VS Code
code .
Download Visual Studio Code from: https://code.visualstudio.com/download
πΆ k9s Installation¶
π― Kind Installation¶
Getting Started¶
Let’s dive into the world of Kubernetes together!
Getting Started
Kubernetes Labs¶
π Lab Overview¶
Welcome to the hands-on Kubernetes labs! This comprehensive series of labs will guide you through essential Kubernetes concepts and advanced topics.
ποΈ Available Labs¶
Getting Started¶
| Lab | Topic | Description |
|---|---|---|
| 00 | Verify Cluster | Ensure your Kubernetes cluster is properly configured |
| 01 | Namespace | Learn to organize resources with namespaces |
| 02 | Deployments (Imperative) | Create deployments using kubectl commands |
| 03 | Deployments (Declarative) | Create deployments using YAML manifests |
| 04 | Rollout | Manage deployment updates and rollbacks |
| 20 | CronJob | Schedule recurring tasks |
Networking¶
| Lab | Topic | Description |
|---|---|---|
| 05 | Services | Expose applications with Kubernetes services |
| 07 | Nginx Ingress | Configure ingress controllers for external access |
| 10 | Istio | Implement service mesh for microservices |
| 33 | NetworkPolicies | Control traffic flow between pods |
Security¶
| Lab | Topic | Description |
|---|---|---|
| 31 | RBAC | Role-based access control for Kubernetes |
| 32 | Secrets | Manage sensitive data in Kubernetes |
| 33 | NetworkPolicies | Control traffic flow between pods |
| 35 | Secret Management | Advanced secret management strategies |
| 37 | ResourceQuotas & LimitRanges | Manage resource consumption per namespace |
Storage & Config¶
| Lab | Topic | Description |
|---|---|---|
| 06 | DataStore | Work with persistent storage in Kubernetes |
| 08 | Kustomization | Manage configurations with Kustomize |
| 09 | StatefulSet | Deploy stateful applications |
| 12 | WordPress MySQL PVC | Complete stateful application with persistent storage |
Observability¶
| Lab | Topic | Description |
|---|---|---|
| 14 | Logging | Centralized logging with Fluentd |
| 15 | Prometheus & Grafana | Monitoring and visualization |
| 29 | EFK Stack | Elasticsearch, Fluentd, and Kibana stack |
GitOps & CI/CD¶
| Lab | Topic | Description |
|---|---|---|
| 13 | HelmChart | Package and deploy applications with Helm |
| 18 | ArgoCD | Implement GitOps with ArgoCD |
| 23 | Helm Operator | Manage Helm releases with operators |
Advanced¶
| Lab | Topic | Description |
|---|---|---|
| 11 | Custom Resource Definition | Extend Kubernetes API with CRDs |
| 16 | Affinity, Taint & Toleration | Control pod scheduling |
| 17 | Pod Disruption Budgets | Ensure availability during disruptions |
| 19 | Custom Scheduler | Build custom scheduling logic |
| 21 | KubeAPI | Work with Kubernetes API |
| 24 | Kubebuilder | Build Kubernetes operators |
| 28 | Telepresence | Local development with remote clusters |
| 30 | KEDA | Kubernetes event-driven autoscaling |
| 34 | crictl | Container runtime interface CLI |
| 36 | kubectl Deep Dive | Advanced kubectl usage and techniques |
π§ Practice Tasks¶
| Task Category | Description |
|---|---|
| Tasks Overview | Overview of all available practice tasks |
| CLI Tasks | Hands-on exercises for CLI, debugging, and orchestration |
| Service Tasks | Practice with Kubernetes services and networking |
| Helm Tasks | Helm chart creation, templating, repositories, and deployment |
| ArgoCD Tasks | GitOps workflows with ArgoCD |
| Scheduling Tasks | Pod scheduling, affinity, and resource management |
| Kubebuilder Tasks | Building Kubernetes operators |
| KEDA Tasks | Event-driven autoscaling exercises |
π― Learning Path¶
Beginner Track¶
Start here if you’re new to Kubernetes:
- Lab 00: Verify Cluster
- Lab 01: Namespace
- Lab 02: Deployments (Imperative)
- Lab 03: Deployments (Declarative)
- Lab 05: Services
Intermediate Track¶
For those with basic Kubernetes knowledge:
Advanced Track¶
For experienced Kubernetes users:
- Lab 10: Istio
- Lab 11: Custom Resource Definition
- Lab 18: ArgoCD
- Lab 19: Custom Scheduler
- Lab 24: Kubebuilder
π‘ Tips for Success¶
- Take your time: Don’t rush through the labs
- Practice regularly: Repetition builds muscle memory
- Experiment: Try variations of the examples
- Read the docs: Kubernetes documentation is excellent
- Join the community: Engage with other learners
π Get Started¶
Ready to begin? Click on any lab on the left menu, or start with Lab 00: Verify Cluster!
Verify Cluster¶
- In this lab we will set up a local Kubernetes cluster using
Kindand verify that it is working correctly. - By the end of this lab you will have a running Kubernetes cluster and confirmed connectivity.
What will we learn?¶
- How to install
Kind(Kubernetes in Docker) - How to create a local Kubernetes cluster
- How to verify cluster connectivity using
kubectl
Prerequisites¶
- Docker installed and running
kubectlinstalled
01. Install Kind¶
- If you don’t have an existing cluster you can use Google Cloud for the labs hands-on.
-
Click on the button below to be able to run the labs on Google Shell:
[Use: CTRL + click to open in new window]
-
Run the following commands based on your operating system:
02. Create a Kind Cluster¶
- You should see an output like this:
Creating cluster "kind" ...
β’ Ensuring node image (kindest/node:v1.27.3) πΌ
β’ Preparing nodes π¦
β’ Writing configuration π
β’ Starting control-plane πΉοΈ
β’ Installing CNI π
β’ Installing StorageClass πΎ
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Thanks for using kind! π
03. Check the Cluster Status¶
- You should see output similar to this one:
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
04. Verify the Cluster is Up and Running¶
- Verify that
kubectlis installed and configured:
- You should get something like the following
apiVersion: v1 clusters: - cluster: certificate-authority-data: DATA+OMITTED server: https://127.0.0.1:6443 name: kind-kind contexts: - context: cluster: kind-kind user: kind-kind name: kind-kind current-context: kind-kind kind: Config preferences: {} users: - name: kind-kind user: client-certificate-data: REDACTED client-key-data: REDACTED
05. Verify That You Can Talk to Your Cluster¶
- You should see output similar to this:
Namespaces¶
- Kubernetes supports multiple virtual clusters backed by the same physical cluster.
- These virtual clusters are called
namespaces. Namespacesare the default way for Kubernetes to separate resources.- Using
namespaceswe can isolate the development, improve security and much more. - Kubernetes clusters has a builtin
namespacecalled default and might contain morenamespaces, likekube-system, for example.
What will we learn?¶
- How to create a Kubernetes
namespace - How to set a default
namespaceforkubectl - How to verify the current namespace configuration
- How to use the
-nflag to target specific namespaces
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
01. Create Namespace¶
# In this sample `codewizard` is the desired namespace
kubectl create namespace codewizard
namespace/codewizard created
### !!! Try to create the following namespace (with _ & -), and see what happens:
kubectl create namespace my_namespace-
02. Setting the Default Namespace for kubectl¶
- To set the default namespace run:
kubectl config set-context $(kubectl config current-context) --namespace=codewizard
Context minikube modified.
03. Verify That You’ve Updated the Namespace¶
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
docker-desktop docker-desktop docker-desktop
docker-for-desktop docker-desktop docker-desktop
* minikube minikube minikube codewizard
04. Using the -n Flag¶
- When using
kubectlyou can pass the-nflag in order to execute thekubectlcommand on a desirednamespace. - For example:
Deployment - Imperative¶
- In this lab we will create Kubernetes deployments using imperative
kubectlcommands. - We will deploy a multitool container, expose it as a service, and test connectivity.
What will we learn?¶
- How to create a deployment using
kubectl create - How to expose a deployment as a
NodePortservice - How to find the assigned IP and port
- How to test the deployment using
curl
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
01. Create Namespace¶
- As completed in the previous lab, create the desired namespace [codewizard]:
- In order to set this is as the default namespace, please refer to set default namespace.
02. Deploy Multitool Image¶
- We start with creating the following deployment praqma/network-multitool.
- This is a multitool for container/network testing and troubleshooting.
# Deploy the first container
kubectl create deployment multitool -n codewizard --image=praqma/network-multitool
deployment.apps/multitool created
kubectl create deploymentactually creates a replica set for us.- We can verify it by running:
kubectl get all -n codewizard
## Expected output:
NAME READY UP-TO-DATE AVAILABLE
deployment.apps/multitool 1/1 1 1
NAME DESIRED CURRENT READY
replicaset.apps/multitool-7885b5f94f 1 1 1
NAME READY STATUS RESTARTS
pod/multitool-7885b5f94f-9s7xh 1/1 Running 0
03. Test the Deployment¶
- The above deployment contains a container named,
multitool. - In order for us to be able to access this
multitoolcontainer, we need to create a resource of typeServicewhich will “open” the server for incoming traffic.
Create a service using kubectl expose¶
# "Expose" the desired port for incoming traffic
# This command is equivalent to declare a `kind: Service` in YAML file
kubectl expose deployment -n codewizard multitool --port 80 --type NodePort
service/multitool exposed
- Verify that the service have been created by running:
kubectl get service -n codewizard
# The output should be something like
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/multitool NodePort 10.102.73.248 <none> 80:31418/TCP 3s
Find the Port and IP Assigned to Our Pod¶
- Grab the port from the previous output.
- Port: In the above sample its
31418[80:31418/TCP] - IP: we will need to grab the cluster IP using
kubectl cluster-info
# get the IP
kubectl cluster-info
# You should get output similar to this one
Kubernetes control plane is running at https://192.168.49.2:8443
KubeDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
# Programmatically get the port and the IP
CLUSTER_IP=$(kubectl get nodes \
--selector=node-role.kubernetes.io/control-plane \
-o jsonpath='{$.items[*].status.addresses[?(@.type=="InternalIP")].address}')
NODE_PORT=$(kubectl get -o \
jsonpath="{.spec.ports[0].nodePort}" \
services multitool -n codewizard)
- In this sample the cluster-ip is
192.168.49.2
Test the Deployment¶
- Test to see if the deployment worked using the
ip address and port numberwe have retrieved above. - Execute
curlwith the following parameters:http://${CLUSTER_IP}:${NODE_PORT}
curl http://${CLUSTER_IP}:${NODE_PORT}
# Or in the above sample
curl 192.168.49.2:30436
# The output should be similar to this:
Praqma Network MultiTool (with NGINX) ...
- If you get the above output, congratulations! You have successfully created a deployment using imperative commands.
Cleanup¶
Deployment - Declarative¶
- In this lab we will create Kubernetes deployments using declarative YAML files.
- We will deploy nginx, scale it up and down, and observe how Kubernetes manages replicas.
What will we learn?¶
- How to create a deployment using a YAML file
- How to apply changes using
kubectl apply - How to scale replicas declaratively and imperatively
- How Kubernetes handles scaling up and down
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
01. Create Namespace¶
- As completed in the previous lab, create the desired namespace [codewizard]:
- In order to set this is as the default namespace, please refer to set default namespace.
02. Deploy nginx Using YAML File (Declarative)¶
- Let’s create the
YAMLfile for the deployment. - If this is your first
k8sYAMLfile, its advisable that you type it in order to get the feeling of the structure. - Save the file with the following name:
nginx.yaml
apiVersion: apps/v1
kind: Deployment # We use a deployment and not pod !!!!
metadata:
name: nginx # Deployment name
namespace: codewizard
labels:
app: nginx # Deployment label
spec:
replicas: 2
selector:
matchLabels: # Labels for the replica selector
app: nginx
template:
metadata:
labels:
app: nginx # Labels for the replica selector
version: "1.17" # Specify specific verion if required
spec:
containers:
- name: nginx # The name of the pod
image: nginx:1.17 # The image which we will deploy
ports:
- containerPort: 80
- Create the deployment using the
-fflag &--record=true
03. Verify That the Deployment Has Been Created¶
kubectl get deployments -n codewizard
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE
multitool 1 1 1 1
nginx 1 1 1 1
04. Check if the Pods Are Running¶
kubectl get pods -n codewizard
NAME READY STATUS RESTARTS
multitool-7885b5f94f-9s7xh 1/1 Running 0
nginx-647fb5956d-v8d2w 1/1 Running 0
05. Playing with K8S Replicas¶
- Let’s play with the replica and see K8S in action.
- Open a second terminal and execute:
06. Update the nginx.yaml File with Replica’s Value of 5¶
07. Update the Deployment Using kubectl apply¶
- Switch to the second terminal and you should see something like the following:
kubectl get pods --watch -n codewizard
NAME READY STATUS RESTARTS AGE
multitool-74477484b8-dj7th 1/1 Running 0 20m
nginx-dc8bb9b45-hqdv9 1/1 Running 0 111s
nginx-dc8bb9b45-vdmp5 0/1 Pending 0 0s
nginx-dc8bb9b45-28wwq 0/1 Pending 0 0s
nginx-dc8bb9b45-wkc68 0/1 Pending 0 0s
nginx-dc8bb9b45-vdmp5 0/1 Pending 0 0s
nginx-dc8bb9b45-28wwq 0/1 Pending 0 0s
nginx-dc8bb9b45-x7j4g 0/1 Pending 0 0s
nginx-dc8bb9b45-wkc68 0/1 Pending 0 0s
nginx-dc8bb9b45-x7j4g 0/1 Pending 0 0s
nginx-dc8bb9b45-vdmp5 0/1 ContainerCreating 0 0s
nginx-dc8bb9b45-28wwq 0/1 ContainerCreating 0 0s
nginx-dc8bb9b45-wkc68 0/1 ContainerCreating 0 0s
nginx-dc8bb9b45-x7j4g 0/1 ContainerCreating 0 0s
nginx-dc8bb9b45-vdmp5 1/1 Running 0 2s
nginx-dc8bb9b45-28wwq 1/1 Running 0 3s
nginx-dc8bb9b45-x7j4g 1/1 Running 0 3s
nginx-dc8bb9b45-wkc68 1/1 Running 0 3s
- Can you explain what do you see?
Why are there more containers than requested?
08. Scaling Down with kubectl scale¶
- Scaling down using
kubectl, and not by editing theYAMLfile:
- Switch to the second terminal. The current output should show something like this:
NAME READY STATUS RESTARTS AGE
multitool-74477484b8-dj7th 1/1 Running 0 29m
nginx-dc8bb9b45-28wwq 1/1 Running 0 4m41s
nginx-dc8bb9b45-hqdv9 1/1 Running 0 10m
nginx-dc8bb9b45-vdmp5 1/1 Running 0 4m41s
nginx-dc8bb9b45-wkc68 1/1 Running 0 4m41s
nginx-dc8bb9b45-x7j4g 1/1 Running 0 4m41s
nginx-dc8bb9b45-x7j4g 1/1 Terminating 0 6m21s
nginx-dc8bb9b45-vdmp5 1/1 Terminating 0 6m21s
nginx-dc8bb9b45-28wwq 1/1 Terminating 0 6m21s
nginx-dc8bb9b45-wkc68 1/1 Terminating 0 6m21s
nginx-dc8bb9b45-x7j4g 0/1 Terminating 0 6m22s
nginx-dc8bb9b45-vdmp5 0/1 Terminating 0 6m22s
nginx-dc8bb9b45-wkc68 0/1 Terminating 0 6m22s
nginx-dc8bb9b45-28wwq 0/1 Terminating 0 6m22s
nginx-dc8bb9b45-28wwq 0/1 Terminating 0 6m26s
nginx-dc8bb9b45-28wwq 0/1 Terminating 0 6m26s
nginx-dc8bb9b45-vdmp5 0/1 Terminating 0 6m26s
nginx-dc8bb9b45-vdmp5 0/1 Terminating 0 6m26s
nginx-dc8bb9b45-wkc68 0/1 Terminating 0 6m27s
nginx-dc8bb9b45-wkc68 0/1 Terminating 0 6m27s
nginx-dc8bb9b45-x7j4g 0/1 Terminating 0 6m27s
nginx-dc8bb9b45-x7j4g 0/1 Terminating 0 6m27s
Cleanup¶
Rollout (Rolling Update)¶
- In this lab we will deploy the same application with several different versions and we will “switch” between them.
- For learning purposes we will play a little with the
CLI.
What will we learn?¶
- How to perform rolling updates on a Kubernetes deployment
- How to inspect rollout history
- How to rollback to a previous version
- How to use
rollout restartfor quick restarts
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
01. Create Namespace¶
- As completed in the previous lab, create the desired namespace [codewizard]:
- In order to set this is as the default namespace, please refer to set default namespace.
02. Create the desired deployment¶
-
We will use the
save-configflagsave-configIf true, the configuration of current object will be saved in its annotation. Otherwise, the annotation will be unchanged. This flag is useful when you want to performkubectl applyon this object in the future. -
Let’s run the following:
Note that in case we already have this deployed, we will get an error message.
03. Expose nginx as a service¶
Again, note that in case we already have this service we will get an error message as well.04. Verify that the pods and the service are running¶
kubectl get all -n codewizard
# The output should be similar to this
NAME READY STATUS RESTARTS AGE
pod/nginx-db749865c-lmgtv 1/1 Running 0 66s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx NodePort 10.102.79.9 <none> 80:31204/TCP 30s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 1/1 1 1 66s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-db749865c 1 1 1 66s
05. Change the number of replicas to 3¶
06. Verify that now we have 3 replicas¶
kubectl get pods -n codewizard
NAME READY STATUS RESTARTS AGE
nginx-db749865c-f5mkt 1/1 Running 0 86s
nginx-db749865c-jgcvb 1/1 Running 0 86s
nginx-db749865c-lmgtv 1/1 Running 0 4m44s
07. Test the deployment¶
# !!! Get the Ip & port for this service
kubectl get services -n codewizard -o wide
# Write down the port number
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
nginx NodePort 10.102.79.9 <none> 80:31204/TCP 7m7s app=nginx
# Get the cluster IP and port
kubectl cluster-info
Kubernetes control plane is running at https://192.168.49.2:8443
# Using the above <host>:<port> test the nginx
# -I is for getting the headers
curl -sI <host>:<port>
# The response should display the nginx version
example: curl -sI 192.168.49.2:31204
HTTP/1.1 200 OK
Server: nginx/1.17.10 <------------ This is the pod version
Date: Fri, 15 Jan 2021 20:13:48 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 14 Apr 2020 14:19:26 GMT
Connection: keep-alive
ETag: "5e95c66e-264"
Accept-Ranges: bytes
...
08. Deploy another version of nginx¶
# Deploy another version of nginx (1.16)
kubectl set image deployment -n codewizard nginx nginx=nginx:1.16 --record
deployment.apps/nginx image updated
# Check to verify that the new version deployed - same as in previous step
curl -sI <host>:<port>
# The response should display the new version
HTTP/1.1 200 OK
Server: nginx/1.16.1 <------------ This is the pod version (new version)
Date: Fri, 15 Jan 2021 20:16:11 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 13 Aug 2019 10:05:00 GMT
Connection: keep-alive
ETag: "5d528b4c-264"
Accept-Ranges: bytes
09. Investigate rollout history:¶
- The rollout history command print out all the saved records:
kubectl rollout history deployment nginx -n codewizard
deployment.apps/nginx
REVISION CHANGE-CAUSE
1 <none>
2 kubectl set image deployment nginx nginx=nginx:1.16 --record=true
3 kubectl set image deployment nginx nginx=nginx:1.15 --record=true
10. Let’s see what was changed during the previous updates:¶
- Print out the rollout changes:
# replace the X with 1 or 2 or any number revision id
kubectl rollout history deployment nginx -n codewizard --revision=<X> # replace here
deployment.apps/nginx with revision #1
Pod Template:
Labels: app=nginx
pod-template-hash=db749865c
Containers:
nginx:
Image: nginx:1.17
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
11. Undo the version upgrade by rolling back and restoring previous version¶
# Check the current nginx version
curl -sI <host>:<port>
# Undo the last deployment
kubectl rollout undo deployment nginx
deployment.apps/nginx rolled back
# Verify that we have the previous version
curl -sI <host>:<port>
12. Rolling Restart¶
- If we deploy using
imagePullPolicy: alwaysset in theYAMLfile, we can userollout restartto forceK8Sto grab the latest image. - This is the fastest restart method these days
CronJobs¶
- In this lab, we will learn how to create and manage
CronJobsin Kubernetes. - A
CronJobcreatesJobson a time-based schedule. It is useful for running periodic and recurring tasks, such as backups or report generation.
What will we learn?¶
- What CronJobs are and how they work in Kubernetes
- How to create, monitor, and manage CronJobs
- How to view Job and Pod outputs from scheduled tasks
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
Introduction¶
- A
CronJobin Kubernetes runs Jobs on a time-based schedule, similar to Linux cron. - Useful for periodic tasks like backups, reports, or cleanup.
Step 01 - Create a CronJob YAML¶
- Create a file named
hello-cronjob.yamlwith the following content:
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
namespace: default
spec:
schedule: "*/1 * * * *" # Every 1 minute
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes CronJob!
restartPolicy: OnFailure
Step 02 - Apply the CronJob¶
Step 03 - Verify CronJob Creation¶
Step 04 - Check CronJob and Jobs¶
- List CronJobs:
- List Jobs created by the CronJob:
- List Pods created by Jobs:
Step 05 - View Job Output¶
- Get the name of a pod created by the CronJob, then view its logs:
Example output:
Cleanup¶
Questions¶
- What happens if the job takes longer than the schedule interval?
- How would you change the schedule to run every 5 minutes?
- How can you limit the number of successful or failed jobs to keep?
Networking
Service Discovery¶
- In this lab we will learn what is a
Serviceand go over the differentServicetypes.
What will we learn?¶
- What a Kubernetes
Serviceis and why you need one - How to create and test
ClusterIP,NodePort, andLoadBalancerservices - How to use Kubernetes internal DNS (
FQDN) to access services - The differences between the service types
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
01. Some General Notes on What is a Service¶
Serviceis a unit of application behavior bound to a unique name in aservice registry.Serviceconsist of multiplenetwork endpointsimplemented by workload instances running on pods, containers, VMs etc.Serviceallow us to gain access to any given pod or container (e.g., a web service).- A
serviceis (normally) created on top of an existing deployment and exposing it to the “world”, using IP(s) & port(s). K8Sdefine 3 main ways (+FQDN internally) to define a service, which means that we have 4 different ways to access Pods.- There are several proxy mode which inplements diffrent behaviour, for example in
user proxy modefor eachServicekube-proxyopens a port (randomly chosen) on the local node. Any connections to this “proxy port” are proxied to one of the Service’s backend Pods (as reported via Endpoints). - All the service types are assigned with a
Cluster-IP. - Every service also creates
Endoint(s), which point to the actual pods.Endpointsare usually referred to asback-endsof a particular service.
01. Create namespace and clear previous data if there is any¶
# If the namespace already exists and contains data form previous steps, let's clean it
kubectl delete namespace codewizard
# Create the desired namespace [codewizard]
kubectl create namespace codewizard
namespace/codewizard created
02. Create the required resources for this hand-on¶
# Network tools pod
kubectl create deployment -n codewizard multitool --image=praqma/network-multitool
deployment.apps/multitool created
# nginx pod
kubectl create deployment -n codewizard nginx --image=nginx
deployment.apps/nginx created
# Verify that the pods running
kubectl get all -n codewizard
NAME READY STATUS RESTARTS AGE
pod/multitool-74477484b8-bdrwr 1/1 Running 0 29s
pod/nginx-6799fc88d8-p2fjn 1/1 Running 0 7s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/multitool 1/1 1 1 30s
deployment.apps/nginx 1/1 1 1 8s
NAME DESIRED CURRENT READY AGE
replicaset.apps/multitool-74477484b8 1 1 1 30s
replicaset.apps/nginx-6799fc88d8 1 1 1 8s
Service Type: ClusterIP¶
- If not specified, the default service type is
ClusterIP. - In order to expose the deployment as a service, use:
--type=ClusterIP ClusterIPwill expose the pods within the cluster. Since we don’t have anexternal IP, it will not be reachable from outside the cluster.- When the service is created
K8Sattaches a DNS record to the service in the following format:<service name>.<namespace>.svc.cluster.local
03. Expose the nginx with ClusterIP¶
# Expose the service on port 80
kubectl expose deployment nginx -n codewizard --port 80 --type ClusterIP
service/nginx exposed
# Check the services and see it's type
# Grab the ClusterIP - we will use it in the next steps
kubectl get services -n codewizard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
nginx ClusterIP 10.109.78.182 <none> 80/TCP
04. Test the nginx with ClusterIP¶
- Since the service is a
ClusterIP, we will test if we can access the service using the multitool pod.
# Get the name of the multitool pod to be used
kubectl get pods -n codewizard
NAME
multitool-XXXXXX-XXXXX
# Run an interactive shell inside the network-multitool-container (same concept as with Docker)
kubectl exec -it <pod name> -n codewizard -- sh
- Connect to the service in any of the following ways:
Test the nginx with ClusterIP¶
1. using the IP from the services output. grab the server response:¶
# Expected output:
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
2. Test the nginx using the deployment name - using the service name since its the DNS name behind the scenes¶
# Expected output:
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
</p>
<p>
For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br />
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.
</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
3. using the full DNS name - for every service we have a full FQDN (Fully qualified domain name) so we can use it as well¶
# bash-5.0# curl -s <service name>.<namespace>.svc.cluster.local
bash-5.0# curl -s nginx.codewizard.svc.cluster.local
Service Type: NodePort¶
NodePort: Exposes the Service on each Node’s IP at a static port (theNodePort).- A
ClusterIPService, to which theNodePortService routes, is automatically created. NodePortservice is reachable from outside the cluster, by requesting<Node IP>:<Node Port>.- The NodePort is allocated from a flag-configured range (default: 30000-32767).
05. Create NodePort¶
1. Delete previous service¶
# Delete the existing service from previous steps
kubectl delete svc nginx -n codewizard
service "nginx" deleted
2. Create NodePort service¶
# As before but this time the type is a NodePort
kubectl expose deployment -n codewizard nginx --port 80 --type NodePort
service/nginx exposed
# Verify that the type is set to NodePort.
# This time you should see ClusterIP and port as well
kubectl get svc -n codewizard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
nginx NodePort 100.65.29.172 <none> 80:32593/TCP
Note the PORT(S) column: 80:32593/TCP.
- 80 is the port the service exposes internally (ClusterIP).
- 32593 is the NodePort (the port exposed on every node).
3. Test the NodePort service¶
To test the service from outside the cluster (e.g., from your local machine), we need two pieces of information: 1. The Node IP: The IP address of one of the cluster nodes. 2. The NodePort: The port allocated to the service (which we saw above).
Step 3.1: Get the Node Port
We can retrieve the allocated NodePort manually from the kubectl get svc output, or programmatically:
# Get the NodePort allocated to the 'nginx' service
kubectl get svc nginx -n codewizard -o jsonpath='{.spec.ports[0].nodePort}{"\n"}'
32593
Step 3.2: Get the Node IP
We need the IP address of a node. In a multi-node cluster, any node’s IP will work.
# List nodes and their IP addresses
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE
minikube Ready control-plane 1d v1.26.1 192.168.49.2 <none> Buildroot 2021.02.4
minikube ip to get this IP directly.
Step 3.3: Access the Service
Now construct the URL using the format http://<NODE_IP>:<NODE_PORT>.
# Example: curl http://192.168.49.2:32593
# Replace with YOUR actual Node IP and Node Port
curl -s http://<NODE_IP>:<NODE_PORT>
# Expected output:
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
<h1>Welcome to nginx!</h1>
...
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Service Type: LoadBalancer¶
Note
We cannot test a LoadBalancer service locally on a localhost, but only on a cluster which can provide an external-IP
06. Create LoadBalancer (only if you are on real cloud)¶
1. Delete previous service¶
# Delete the existing service from previous steps
kubectl delete svc nginx -n codewizard
service "nginx" deleted
2. Create LoadBalancer Service¶
# As before this time the type is a LoadBalancer
kubectl expose deployment nginx -n codewizard --port 80 --type LoadBalancer
service/nginx exposed
# In real cloud we should se an EXTERNAL-IP and we can access the service
# via the internet
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
nginx LoadBalancer 100.69.15.89 35.205.60.29 80:31354/TCP
3. Test the LoadBalancer Service¶
Nginx-Ingress¶
- Kubernetes
ingressobject is aDNS. - To enable an
ingress object, we need aningress controller. - In this lab we will use
Nginx-Ingressto route external traffic to services inside the cluster.
What will we learn?¶
- How to deploy an application and expose it as a service
- How to configure an Nginx Ingress controller
- How to create SSL certificates and store them as secrets
- How to deploy an Ingress resource with TLS
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster- Minikube (for the ingress addon) or an existing ingress controller
Important
We cannot see it in action on a localhost (meaning that it will not get an external IP) unless we use the explicit http://host:port format.
01. Deploy Sample App¶
- To get started with
Nginx-Ingress, we will deploy out previous app:
# Create 3 containers
kubectl create deployment ingress-pods --image=nirgeier/k8s-secrets-sample --replicas=3
# Expose the service
kubectl expose deployment ingress-pods --port=5000
02. Deploy default backend¶
- Now lets deploy the
Nginx-Ingress(grabbed from the official site):
apiVersion: apps/v1
kind: Deployment
metadata:
name: default-http-backend
spec:
replicas: 1
selector:
matchLabels:
app: default-http-backend
template:
metadata:
labels:
app: default-http-backend
spec:
terminationGracePeriodSeconds: 60
containers:
- name: default-http-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: gcr.io/google_containers/defaultbackend:1.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
03. Create service¶
- Next, let’s create the service:
apiVersion: v1
kind: Service
metadata:
name: default-http-backend
spec:
selector:
app: default-http-backend
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: NodePort
04. Import ssl certificate¶
- In this demo we will use certificate.
- The certificate is in the same folder as this file
- The certificate is for the hostname:
ingress.local
# If you wish to create the certificate use this script
### ---> The common Name fiels is your host for later on
### Common Name (e.g. server FQDN or YOUR name) []:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout certificate.key -out certificate.crt
# Create a pem file
# The purpose of the DH parameters is to exchange secrets
openssl dhparam -out certificate.pem 2048
- Store the certificate in secret:
# Store the certificate
kubectl create secret tls tls-certificate --key certificate.key --cert certificate.crt
secret/tls-certificate created
# Store the DH parameters
kubectl create secret generic tls-dhparam --from-file=certificate.pem
secret/tls-dhparam created
05. Deploy the ingress¶
- Now that we have the certificate, we can deploy the
Ingress:
# Ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: my-first-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.org/ssl-services: "my-service"
spec:
tls:
- hosts:
- myapp.local
secretName: tls-certificate
rules:
- host: myapp.local
http:
paths:
- path: /
backend:
serviceName: ingress-pods
servicePort: 5000
06. Enable the ingress addon¶
- The
Ingressis not enabled by default, so we have to “turn it on”:
Istio Service Mesh & Kiali¶
Istiois an open-source service mesh that provides a uniform way to manage microservices communication.- This lab demonstrates a complete Istio service mesh deployment on Kubernetes with Kiali for observability.
- Everything is installed via Helm charts for reproducibility and production-readiness.
What will we learn?¶
- Install and configure Istio service mesh using Helm
- Deploy Kiali, Prometheus, Grafana, and Jaeger as observability addons
- Deploy a microservices demo application with sidecar injection
- Generate live traffic and observe it in Kiali’s topology graph
- Configure traffic management: routing, canary deployments, fault injection
- Enable and verify mutual TLS (mTLS) between services
- Use circuit breakers, timeouts, and rate limiting
- Perform traffic shifting and A/B testing
- Observe distributed traces in Jaeger
- Monitor service metrics in Grafana dashboards
What is Istio?¶
Istioextends Kubernetes to establish a programmable, application-aware network using the Envoy service proxy.- Istio provides a control plane (Istiod) and a data plane (Envoy sidecars injected into every pod).
- It requires zero application code changes - all features are handled transparently by the mesh.
Core Components¶
| Component | Role | Default Port |
|---|---|---|
| Istiod | Control plane - manages configuration, certificates, service discovery | N/A |
| Envoy | Sidecar proxy injected into each pod - intercepts all pod network traffic | N/A |
| Kiali | Service mesh observability console - topology, health, config validation | 20001 |
| Prometheus | Metrics collection and storage for Istio telemetry | 9090 |
| Grafana | Dashboards for mesh, service, and workload metrics | 3000 |
| Jaeger | Distributed tracing backend and UI | 16686 |
Istio Key CRDs¶
| CRD | Purpose |
|---|---|
VirtualService |
Define routing rules: traffic shifting, fault injection, timeouts |
DestinationRule |
Define policies after routing: load balancing, circuit breaker, mTLS |
Gateway |
Configure load balancer at mesh edge for HTTP/TCP traffic |
PeerAuthentication |
Configure mTLS mode: STRICT, PERMISSIVE, DISABLE |
AuthorizationPolicy |
Access control policies for workloads |
ServiceEntry |
Add external services (outside the mesh) to Istio’s service registry |
Architecture¶
graph TB
ext["External Traffic"] --> gw
subgraph cluster["Kubernetes Cluster"]
subgraph istio["istio-system namespace"]
gw["Istio Ingress Gateway"]
istiod["Istiod\n(control plane)"]
prometheus["Prometheus"]
grafana["Grafana"]
jaeger["Jaeger"]
kiali["Kiali"]
end
subgraph bookinfo["bookinfo namespace (istio-injection=enabled)"]
pp["productpage v1 + Envoy"]
det["details v1 + Envoy"]
rv1["reviews v1\n(no stars)"]
rv2["reviews v2\n(black stars)"]
rv3["reviews v3\n(red stars)"]
rat["ratings v1 + Envoy"]
end
subgraph tgns["traffic-gen namespace"]
tgen["Traffic Generator\n(CronJob)"]
end
gw --> pp
tgen --> gw
pp --> det
pp --> rv1
pp --> rv2
pp --> rv3
rv2 --> rat
rv3 --> rat
istiod -. config/certs .-> pp
istiod -. config/certs .-> det
istiod -. config/certs .-> rv1
pp -. metrics .-> prometheus
det -. metrics .-> prometheus
rv1 -. metrics .-> prometheus
pp -. traces .-> jaeger
prometheus --> kiali
jaeger --> kiali
istiod --> kiali
prometheus --> grafana
end
Directory Structure¶
10-Istio/
βββ README.md # This file
βββ demo.sh # Main deployment script (deploy/cleanup)
βββ monitor.sh # Interactive monitoring & status checks
β
βββ scripts/
β βββ common.sh # Shared functions & colors
β βββ 01-install-istio.sh # Install Istio via Helm
β βββ 02-install-addons.sh # Install Kiali, Prometheus, Grafana, Jaeger
β βββ 03-deploy-bookinfo.sh # Deploy Bookinfo sample application
β βββ 04-traffic-generator.sh # Deploy live traffic generator
β βββ 05-verify.sh # Verify all components
β
βββ manifests/
β βββ namespace.yaml # bookinfo namespace with injection label
β βββ bookinfo.yaml # Bookinfo application manifests
β βββ bookinfo-gateway.yaml # Istio Gateway + VirtualService for ingress
β βββ destination-rules.yaml # DestinationRules for all service versions
β βββ traffic-generator.yaml # CronJob for continuous traffic generation
β βββ addons/ # Observability addon manifests
β βββ prometheus.yaml
β βββ grafana.yaml
β βββ jaeger.yaml
β βββ kiali.yaml
β
βββ istio-features/
βββ 01-traffic-shifting.yaml # Canary: route % of traffic to v2/v3
βββ 02-fault-injection.yaml # Inject delays and HTTP errors
βββ 03-circuit-breaker.yaml # Circuit breaker with connection limits
βββ 04-request-routing.yaml # Route by header (user identity)
βββ 05-timeout-retry.yaml # Configure timeouts and retries
βββ 06-mirror-traffic.yaml # Traffic mirroring / shadow traffic
βββ 07-mtls-strict.yaml # Enforce strict mTLS
βββ apply-feature.sh # Apply/reset feature demos
Prerequisites¶
- Kubernetes cluster (v1.24+) with at least 8 GB RAM available
kubectlconfigured to access your clusterHelm 3.xinstalled- Nginx Ingress Controller (required for Ingress-based access to dashboards and Bookinfo)
- (Optional)
istioctlfor debugging
# Install kubectl (macOS)
brew install kubectl
# Install Helm
brew install helm
# Install istioctl (optional)
brew install istioctl
# Verify installations
kubectl version --client
helm version
Lab¶
Part 01 - Deploy Istio Service Mesh¶
01. Deploy Everything¶
# Make scripts executable
chmod +x demo.sh monitor.sh scripts/*.sh istio-features/apply-feature.sh
# Deploy Istio + addons + Bookinfo + traffic generator
./demo.sh deploy
The script will:
- Check prerequisites:
kubectl,helm, cluster connectivity - Install Istio CRDs and control plane via Helm
- Install Kiali, Prometheus, Grafana, and Jaeger
- Create the
bookinfonamespace with sidecar injection enabled - Deploy the Bookinfo sample application (4 microservices, multiple versions)
- Configure the Istio Ingress Gateway and DestinationRules
- Start continuous traffic generation via CronJob
- Wait for all pods to be in
Runningstate and print access URLs
02. Access the UIs¶
After deployment, open the dashboards using port-forwarding:
# Kiali - Service Mesh Observability Console
kubectl port-forward svc/kiali -n istio-system 20001:20001 &
open http://localhost:20001
# Grafana - Metrics Dashboards
kubectl port-forward svc/grafana -n istio-system 3000:3000 &
open http://localhost:3000
# Jaeger - Distributed Tracing UI
kubectl port-forward svc/tracing -n istio-system 16686:80 &
open http://localhost:16686
# Prometheus - Metrics Queries
kubectl port-forward svc/prometheus -n istio-system 9090:9090 &
open http://localhost:9090
# Bookinfo Application
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80 &
open http://localhost:8080/productpage
03. Explore Kiali¶
- Open Kiali at
http://localhost:20001 - Navigate to Graph and select the
bookinfonamespace - Observe live traffic flowing between services as coloured edges
- Click on any service to inspect metrics, traces, and health status
- Check Workloads to confirm Envoy sidecar injection on all pods
Part 02 - Bookinfo Application¶
The Bookinfo application consists of four microservices demonstrating multiple service versions:
| Service | Versions | Description |
|---|---|---|
| productpage | v1 | Main frontend - calls details and reviews |
| details | v1 | Book details information |
| reviews | v1, v2, v3 | Book reviews (v1: no stars, v2: black stars, v3: red stars) |
| ratings | v1 | Star ratings (called by reviews v2 and v3 only) |
Application Flow¶
graph LR
user["User"] --> pp["productpage v1"]
pp --> det["details v1"]
pp --> rv1["reviews v1\n(no stars)"]
pp --> rv2["reviews v2\n(black stars)"]
pp --> rv3["reviews v3\n(red stars)"]
rv2 --> rat["ratings v1"]
rv3 --> rat
Note
Each pod in the bookinfo namespace has an Envoy sidecar proxy automatically injected.
All network traffic passes through the sidecar, enabling telemetry, traffic management, and mTLS with zero application changes.
Istio Configuration¶
The lab uses custom Istio settings optimized for demonstrations:
meshConfig:
accessLogFile: /dev/stdout # Enable access logging
enableTracing: true # Enable distributed tracing
defaultConfig:
tracing:
sampling: 100.0 # 100% trace sampling (demo only)
holdApplicationUntilProxyStarts: true
Part 03 - Traffic Management¶
01. Traffic Shifting / Canary Deployment¶
Route a configurable percentage of traffic to different service versions:
graph LR
pp["productpage"] -->|"80%"| rv1["reviews v1\n(no stars)"]
pp -->|"10%"| rv2["reviews v2\n(black stars)"]
pp -->|"10%"| rv3["reviews v3\n(red stars)"]
rv2 --> rat["ratings v1"]
rv3 --> rat
Tip
Observe in Kiali: The Graph view shows weighted edges indicating the traffic split between versions.
02. Request Routing (Header-Based)¶
Route specific users to specific service versions based on HTTP headers:
graph LR
jason["User: jason"] -->|"Header: end-user=jason"| pp["productpage"]
other["Other users"] --> pp
pp -->|"jason"| rv2["reviews v2\n(black stars)"]
pp -->|"others"| rv1["reviews v1\n(no stars)"]
03. Fault Injection¶
Inject failures to test service resilience:
- Injects a 7-second delay for user
jasonon theratingsservice - Injects HTTP 500 errors for 10% of all requests to
ratings
Warning
Observe in Kiali: Error rates appear as red percentages on the graph edges.
04. Circuit Breaker¶
Limit connections to prevent cascading failures across services:
- Max 1 concurrent connection to
reviews - Max 1 pending request in the queue
- Circuit trips after 1 consecutive 5xx error
05. Timeouts and Retries¶
Configure request timeouts and automatic retries at the mesh level:
- 3-second timeout on requests to the
reviewsservice - 2 automatic retries on failure (5xx errors, connect failures)
06. Traffic Mirroring¶
Shadow production traffic to a test version without affecting real users:
graph LR
pp["productpage"] -->|"live traffic"| rv1["reviews v1"]
pp -. "mirrored copy" .-> rv3["reviews v3\n(shadow)"]
rv1 --> rat["ratings v1"]
07. Mutual TLS (mTLS)¶
Enforce encrypted service-to-service communication across the namespace:
- Enables STRICT mTLS mode for the
bookinfonamespace - All inter-service traffic must be encrypted via Istio-managed certificates
- Non-mesh (plain TCP) traffic is rejected
Tip
Verify in Kiali: The Security view shows lock icons on all edges of the graph.
Reset to Default¶
Part 04 - Observability¶
Prometheus Queries¶
Useful PromQL queries for Istio service mesh metrics:
# Request rate by destination service
rate(istio_requests_total{reporter="destination"}[5m])
# P99 latency per service
histogram_quantile(0.99,
sum(rate(istio_request_duration_milliseconds_bucket{reporter="destination"}[5m]))
by (le, destination_service)
)
# Error rate per destination service
sum(rate(istio_requests_total{reporter="destination", response_code=~"5.*"}[5m])) by (destination_service)
/
sum(rate(istio_requests_total{reporter="destination"}[5m])) by (destination_service)
# TCP bytes sent
sum(rate(istio_tcp_sent_bytes_total[5m])) by (destination_service)
Grafana Dashboards¶
Pre-configured Istio dashboards available out of the box:
| Dashboard | Description |
|---|---|
| Istio Mesh Dashboard | Overall mesh health and performance overview |
| Istio Service Dashboard | Per-service request rates, latencies, error rates |
| Istio Workload Dashboard | Per-workload (pod) metrics |
| Istio Control Plane Dashboard | Istiod resource usage and performance |
Jaeger Distributed Tracing¶
- Open Jaeger at
http://localhost:16686 - Select service
productpage.bookinfofrom the dropdown - Click Find Traces to list recent requests
- Examine a trace to see the full end-to-end path across all microservices
- Compare latencies to identify bottlenecks between service versions
Monitor Script¶
# Interactive mode
./monitor.sh
# Quick summary
./monitor.sh summary
# Test connectivity to all components
./monitor.sh test
# Full detailed report
./monitor.sh full
Part 05 - Troubleshooting¶
Pods Not Starting¶
# Check events for clues
kubectl get events -n bookinfo --sort-by='.lastTimestamp'
kubectl get events -n istio-system --sort-by='.lastTimestamp'
# Describe a specific pod
kubectl describe pod <pod-name> -n bookinfo
Sidecar Not Injected¶
# Verify the namespace injection label
kubectl get namespace bookinfo --show-labels
# Expected: istio-injection=enabled
# If the label is missing, add it:
kubectl label namespace bookinfo istio-injection=enabled --overwrite
# Restart deployments to trigger sidecar injection
kubectl rollout restart deployment -n bookinfo
No Traffic Visible in Kiali¶
# Verify the traffic generator CronJob is running
kubectl get cronjob -n traffic-gen
kubectl get jobs -n traffic-gen --sort-by=.metadata.creationTimestamp | tail -5
# Confirm productpage is reachable from within the cluster
kubectl exec -n bookinfo deploy/productpage-v1 -- \
curl -s http://localhost:9080/productpage | head -20
# Verify Prometheus is collecting Istio metrics
kubectl exec -n istio-system deploy/prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=istio_requests_total' | head -50
Note
Wait 1β2 minutes after deploying the traffic generator for metrics to propagate into Prometheus and Kiali.
Istio Configuration Issues¶
# Analyze Istio configuration for problems
istioctl analyze -n bookinfo
# Check proxy sync status for all pods
istioctl proxy-status
# Inspect proxy routing config for a specific pod
istioctl proxy-config routes deploy/productpage-v1 -n bookinfo
Kiali Not Showing Data¶
# Confirm Prometheus is running
kubectl get pods -n istio-system -l app=prometheus
# Check Kiali logs for errors
kubectl logs -n istio-system -l app=kiali --tail=50
Part 06 - Cleanup¶
Full Cleanup¶
This will remove:
traffic-gennamespace and all traffic generator resourcesbookinfonamespace and all application resources- Kiali, Prometheus, Grafana, and Jaeger Helm releases
- Istio control plane Helm release
- All Istio CRDs
- All remaining namespaces created by this lab
Partial Cleanup¶
# Remove only the Bookinfo app (keep Istio + addons running)
kubectl delete namespace bookinfo
kubectl delete namespace traffic-gen
# Remove only Istio feature demos (restore default routing)
./istio-features/apply-feature.sh reset
# Remove only observability addons (keep Istio + app running)
kubectl delete -f manifests/addons/ -n istio-system
Resources¶
Security
RBAC - Role-Based Access Control¶
- In this lab we will learn how Kubernetes Role-Based Access Control (RBAC) works and how to use Roles, ClusterRoles, RoleBindings, ClusterRoleBindings, and ServiceAccounts to control who can do what inside a cluster.
What will we learn?¶
- What RBAC is and why it is essential for Kubernetes security
- The four RBAC API objects:
Role,ClusterRole,RoleBinding,ClusterRoleBinding - How to create and assign fine-grained permissions to users and ServiceAccounts
- How to test permissions with
kubectl auth can-i - How to grant a pod access to the Kubernetes API using a ServiceAccount
- Difference between namespace-scoped and cluster-scoped permissions
- Best practices: principle of least privilege
Official Documentation & References¶
| Resource | Link |
|---|---|
| RBAC Authorization | kubernetes.io/docs |
| Using RBAC Authorization | kubernetes.io/docs |
| ServiceAccounts | kubernetes.io/docs |
| kubectl auth can-i | kubernetes.io/docs |
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
RBAC Overview¶
graph LR
subgraph identity["Who?"]
user["User / Group"]
sa["ServiceAccount"]
end
subgraph binding["Binding"]
rb["RoleBinding\n(namespace-scoped)"]
crb["ClusterRoleBinding\n(cluster-scoped)"]
end
subgraph permission["What can they do?"]
role["Role\n(namespace-scoped)"]
cr["ClusterRole\n(cluster-scoped)"]
end
user --> rb
sa --> rb
user --> crb
sa --> crb
rb --> role
rb --> cr
crb --> cr
| Object | Scope | Purpose |
|---|---|---|
Role |
Namespace | Defines a set of permissions within a namespace |
ClusterRole |
Cluster | Defines a set of permissions cluster-wide |
RoleBinding |
Namespace | Grants a Role/ClusterRole to a subject within a namespace |
ClusterRoleBinding |
Cluster | Grants a ClusterRole to a subject cluster-wide |
ServiceAccount |
Namespace | Identity for processes running in pods |
01. Create namespace¶
# Clean up if it already exists
kubectl delete namespace rbac-lab --ignore-not-found
# Create the lab namespace
kubectl create namespace rbac-lab
02. Create a Role (namespace-scoped)¶
A Role grants permissions within a specific namespace. Create a Role that allows read-only access to pods:
# manifests/role-pod-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: rbac-lab
name: pod-reader
rules:
- apiGroups: [""] # "" = core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
Understanding Rules
- apiGroups:
""is the core API group (pods, services, configmaps). Use"apps"for deployments,"batch"for jobs, etc. - resources: Kubernetes resource types (pods, services, deployments, secrets, etc.)
- verbs: Actions -
get,list,watch,create,update,patch,delete
03. Create a ServiceAccount¶
04. Create a RoleBinding¶
Bind the pod-reader Role to our app-reader ServiceAccount:
# manifests/rolebinding-pod-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: rbac-lab
subjects:
- kind: ServiceAccount
name: app-reader
namespace: rbac-lab
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
05. Test permissions with kubectl auth can-i¶
# Check: can the ServiceAccount list pods in rbac-lab?
kubectl auth can-i list pods \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:app-reader
# Expected output: yes
# Check: can it delete pods? (should be denied)
kubectl auth can-i delete pods \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:app-reader
# Expected output: no
# Check: can it list pods in the default namespace? (should be denied)
kubectl auth can-i list pods \
--namespace default \
--as system:serviceaccount:rbac-lab:app-reader
# Expected output: no
06. Use a ServiceAccount in a Pod¶
Deploy a pod that uses the app-reader ServiceAccount to query the Kubernetes API from within the pod:
# manifests/pod-with-sa.yaml
apiVersion: v1
kind: Pod
metadata:
name: api-explorer
namespace: rbac-lab
spec:
serviceAccountName: app-reader
containers:
- name: kubectl
image: bitnami/kubectl:latest
command: ["sleep", "3600"]
kubectl apply -f manifests/pod-with-sa.yaml
# Wait for the pod to be running
kubectl wait --for=condition=Ready pod/api-explorer -n rbac-lab --timeout=60s
Now exec into the pod and test the API access:
# Exec into the pod
kubectl exec -it api-explorer -n rbac-lab -- bash
# Inside the pod - list pods (should work)
kubectl get pods -n rbac-lab
# Inside the pod - try to delete a pod (should fail with Forbidden)
kubectl delete pod api-explorer -n rbac-lab
# Inside the pod - try to list services (should fail - not in our Role)
kubectl get services -n rbac-lab
# Exit the pod
exit
07. Create a ClusterRole and ClusterRoleBinding¶
A ClusterRole + ClusterRoleBinding grants permissions across all namespaces:
# manifests/clusterrole-namespace-viewer.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: namespace-viewer
rules:
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list", "watch"]
# manifests/clusterrolebinding-namespace-viewer.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: view-namespaces
subjects:
- kind: ServiceAccount
name: app-reader
namespace: rbac-lab
roleRef:
kind: ClusterRole
name: namespace-viewer
apiGroup: rbac.authorization.k8s.io
kubectl apply -f manifests/clusterrole-namespace-viewer.yaml
kubectl apply -f manifests/clusterrolebinding-namespace-viewer.yaml
Test it:
# Can now list namespaces cluster-wide
kubectl auth can-i list namespaces \
--as system:serviceaccount:rbac-lab:app-reader
# Expected output: yes
08. Aggregate ClusterRoles¶
Kubernetes supports aggregation - automatically combining ClusterRoles via labels. The built-in view, edit, and admin ClusterRoles use this pattern:
# manifests/clusterrole-custom-view.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: custom-metrics-viewer
labels:
# This label makes it auto-aggregate into the built-in "view" ClusterRole
rbac.authorization.k8s.io/aggregate-to-view: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
resources: ["pods", "nodes"]
verbs: ["get", "list"]
kubectl apply -f manifests/clusterrole-custom-view.yaml
# Verify it was aggregated into the "view" ClusterRole
kubectl get clusterrole view -o yaml | grep -A5 "aggregationRule"
09. Explore default ClusterRoles¶
Kubernetes ships with several built-in ClusterRoles:
# List all ClusterRoles
kubectl get clusterroles
# Inspect the built-in "view" role (read-only across most resources)
kubectl describe clusterrole view
# Inspect the built-in "edit" role (read-write but no RBAC changes)
kubectl describe clusterrole edit
# Inspect the built-in "admin" role (full access within a namespace)
kubectl describe clusterrole admin
# Inspect "cluster-admin" (full access to everything)
kubectl describe clusterrole cluster-admin
| Built-in ClusterRole | Permissions |
|---|---|
view |
Read-only access to most resources (no secrets) |
edit |
Read-write access to most resources (no RBAC or namespace) |
admin |
Full control within a namespace (including RBAC) |
cluster-admin |
Unrestricted access to everything (use with extreme caution!) |
Security Best Practice
Never bind cluster-admin to application ServiceAccounts. Follow the principle of least privilege - grant only the minimum permissions required.
10. Cleanup¶
kubectl delete namespace rbac-lab
kubectl delete clusterrole namespace-viewer custom-metrics-viewer --ignore-not-found
kubectl delete clusterrolebinding view-namespaces --ignore-not-found
Summary¶
| Concept | Key Takeaway |
|---|---|
| Role | Namespace-scoped permissions |
| ClusterRole | Cluster-scoped permissions (or reusable across NS) |
| RoleBinding | Assigns Role/ClusterRole within a namespace |
| ClusterRoleBinding | Assigns ClusterRole cluster-wide |
| ServiceAccount | Pod identity - attach Roles to pods via ServiceAccounts |
kubectl auth can-i |
Test permissions without trial-and-error |
| Principle of Least Privilege | Always grant the minimum permissions required |
Exercises¶
The following exercises will test your understanding of Kubernetes RBAC. Try to solve each exercise on your own before revealing the solution.
01. Create a Role That Grants Full Access to ConfigMaps¶
Create a Role named configmap-admin in the rbac-lab namespace that allows all operations (get, list, watch, create, update, patch, delete) on ConfigMaps. Bind it to a new ServiceAccount named config-manager.
Scenario:¶
β¦ Your application needs to dynamically create and update ConfigMaps for feature flags. β¦ The ServiceAccount must have full CRUD access to ConfigMaps but nothing else.
Hint: Use kubectl create role with --verb='*' or list all verbs, and kubectl create rolebinding to bind it.
Solution
## Create the namespace (if not already created)
kubectl create namespace rbac-lab --dry-run=client -o yaml | kubectl apply -f -
## Create the ServiceAccount
kubectl create serviceaccount config-manager -n rbac-lab
## Create the Role with full ConfigMap access
kubectl create role configmap-admin \
--namespace rbac-lab \
--verb=get,list,watch,create,update,patch,delete \
--resource=configmaps
## Bind the Role to the ServiceAccount
kubectl create rolebinding configmap-admin-binding \
--namespace rbac-lab \
--role=configmap-admin \
--serviceaccount=rbac-lab:config-manager
## Test: can the ServiceAccount create configmaps?
kubectl auth can-i create configmaps \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:config-manager
## Expected: yes
## Test: can it delete configmaps?
kubectl auth can-i delete configmaps \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:config-manager
## Expected: yes
## Test: can it access secrets? (should be denied)
kubectl auth can-i get secrets \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:config-manager
## Expected: no
## Clean up
kubectl delete rolebinding configmap-admin-binding -n rbac-lab
kubectl delete role configmap-admin -n rbac-lab
kubectl delete serviceaccount config-manager -n rbac-lab
02. Use kubectl auth can-i --list to Audit Permissions¶
List all permissions that the app-reader ServiceAccount has in the rbac-lab namespace and cluster-wide. Identify which permissions come from the Role vs the ClusterRole.
Scenario:¶
β¦ A security audit requires you to document all permissions granted to a ServiceAccount. β¦ You need to distinguish between namespace-scoped and cluster-scoped permissions.
Hint: Use kubectl auth can-i --list --namespace rbac-lab --as system:serviceaccount:rbac-lab:app-reader.
Solution
## Ensure the namespace and bindings exist (from the main lab)
kubectl create namespace rbac-lab --dry-run=client -o yaml | kubectl apply -f -
kubectl create serviceaccount app-reader -n rbac-lab --dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f manifests/role-pod-reader.yaml
kubectl apply -f manifests/rolebinding-pod-reader.yaml
kubectl apply -f manifests/clusterrole-namespace-viewer.yaml
kubectl apply -f manifests/clusterrolebinding-namespace-viewer.yaml
## List namespace-scoped permissions in rbac-lab
kubectl auth can-i --list \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:app-reader
## Expected output includes:
## pods [] [] [get watch list] <-- from Role pod-reader
## namespaces [] [] [get list watch] <-- from ClusterRole namespace-viewer
## List cluster-scoped permissions (no namespace)
kubectl auth can-i --list \
--as system:serviceaccount:rbac-lab:app-reader
## Expected output includes:
## namespaces [] [] [get list watch] <-- from ClusterRoleBinding
## Check a specific permission
kubectl auth can-i list pods \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:app-reader
## Expected: yes
kubectl auth can-i list pods \
--namespace default \
--as system:serviceaccount:rbac-lab:app-reader
## Expected: no (pod-reader Role is only in rbac-lab)
03. Create a ClusterRole That Allows Reading Logs¶
Create a ClusterRole named log-reader that grants access to pods/log (a subresource). Bind it to a ServiceAccount log-collector in the rbac-lab namespace using a RoleBinding (not a ClusterRoleBinding) to limit it to the namespace.
Scenario:¶
β¦ Your centralized logging agent needs to read pod logs but only in a specific namespace. β¦ You want to reuse a ClusterRole via a namespace-scoped RoleBinding.
Hint: Use pods/log as the resource in the ClusterRole. A RoleBinding can reference a ClusterRole but limits its scope to the binding’s namespace.
Solution
## Create the ServiceAccount
kubectl create serviceaccount log-collector -n rbac-lab
## Create a ClusterRole for reading pod logs
cat <<'EOF' | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: log-reader
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]
EOF
## Use a RoleBinding (not ClusterRoleBinding) to limit scope to rbac-lab
kubectl create rolebinding log-reader-binding \
--namespace rbac-lab \
--clusterrole=log-reader \
--serviceaccount=rbac-lab:log-collector
## Test: can read logs in rbac-lab?
kubectl auth can-i get pods/log \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:log-collector
## Expected: yes
## Test: can read logs in default namespace? (should be denied)
kubectl auth can-i get pods/log \
--namespace default \
--as system:serviceaccount:rbac-lab:log-collector
## Expected: no
## Clean up
kubectl delete rolebinding log-reader-binding -n rbac-lab
kubectl delete clusterrole log-reader
kubectl delete serviceaccount log-collector -n rbac-lab
04. Restrict a ServiceAccount to Only exec into Pods¶
Create a Role that grants only create access on the pods/exec subresource, and basic get on pods. Bind it to a new ServiceAccount and verify it can exec but cannot delete or list pods.
Scenario:¶
β¦ A debugging tool needs to exec into running pods for troubleshooting. β¦ It should not be able to list, delete, or modify pods - only exec into them.
Hint: Use two rules in the Role: one for pods with get, and one for pods/exec with create.
Solution
## Create the ServiceAccount
kubectl create serviceaccount exec-debugger -n rbac-lab
## Create the Role
cat <<'EOF' | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: rbac-lab
name: exec-only
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
EOF
## Bind the Role
kubectl create rolebinding exec-debugger-binding \
--namespace rbac-lab \
--role=exec-only \
--serviceaccount=rbac-lab:exec-debugger
## Test: can exec?
kubectl auth can-i create pods/exec \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:exec-debugger
## Expected: yes
## Test: can get pods?
kubectl auth can-i get pods \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:exec-debugger
## Expected: yes
## Test: can list pods? (should be denied)
kubectl auth can-i list pods \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:exec-debugger
## Expected: no
## Test: can delete pods? (should be denied)
kubectl auth can-i delete pods \
--namespace rbac-lab \
--as system:serviceaccount:rbac-lab:exec-debugger
## Expected: no
## Clean up
kubectl delete rolebinding exec-debugger-binding -n rbac-lab
kubectl delete role exec-only -n rbac-lab
kubectl delete serviceaccount exec-debugger -n rbac-lab
05. Verify That a Pod Uses the Correct ServiceAccount Token¶
Deploy a pod with a custom ServiceAccount and verify from inside the pod that it uses the correct token by querying the Kubernetes API directly (without kubectl).
Scenario:¶
β¦ You need to verify that a pod’s ServiceAccount token is correctly mounted and functional. β¦ The pod should be able to authenticate to the Kubernetes API using its token.
Hint: The ServiceAccount token is mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. Use curl with the token to hit the API server.
Solution
## Create the ServiceAccount and Role (reuse from main lab)
kubectl create serviceaccount app-reader -n rbac-lab --dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f manifests/role-pod-reader.yaml
kubectl apply -f manifests/rolebinding-pod-reader.yaml
## Deploy a pod with curl available
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: token-verifier
namespace: rbac-lab
spec:
serviceAccountName: app-reader
containers:
- name: curl
image: curlimages/curl:latest
command: ["sleep", "3600"]
EOF
kubectl wait --for=condition=Ready pod/token-verifier -n rbac-lab --timeout=60s
## Verify the token is mounted
kubectl exec token-verifier -n rbac-lab -- \
ls /var/run/secrets/kubernetes.io/serviceaccount/
## Read the ServiceAccount name
kubectl exec token-verifier -n rbac-lab -- \
cat /var/run/secrets/kubernetes.io/serviceaccount/namespace
echo ## Newline
## Expected: rbac-lab
## Use the token to query the API server
kubectl exec token-verifier -n rbac-lab -- sh -c '
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
CACERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
curl -s --cacert $CACERT \
-H "Authorization: Bearer $TOKEN" \
https://kubernetes.default.svc/api/v1/namespaces/rbac-lab/pods | head -20
'
## Expected: JSON response listing pods in rbac-lab
## Clean up
kubectl delete pod token-verifier -n rbac-lab
Troubleshooting¶
- Permission denied (Forbidden):
Verify the Role, RoleBinding, and subject names match exactly:
## Check Role exists and has correct rules
kubectl get role pod-reader -n rbac-lab -o yaml
## Check RoleBinding exists and references the correct Role and subject
kubectl get rolebinding read-pods -n rbac-lab -o yaml
## Common mistake: ServiceAccount name or namespace mismatch
kubectl get serviceaccount -n rbac-lab
kubectl auth can-ireturns unexpected results:
Ensure you are using the correct impersonation format:
## Correct format for ServiceAccounts:
kubectl auth can-i list pods \
--as system:serviceaccount:<namespace>:<sa-name>
## Correct format for users:
kubectl auth can-i list pods --as <username>
## List all permissions for a ServiceAccount
kubectl auth can-i --list --as system:serviceaccount:rbac-lab:app-reader -n rbac-lab
- ClusterRole not working across namespaces:
Ensure you used a ClusterRoleBinding (not a RoleBinding). A RoleBinding limits a ClusterRole to a single namespace:
## Check the binding type
kubectl get clusterrolebinding view-namespaces -o yaml
## If it's a RoleBinding, it only works in the binding's namespace
kubectl get rolebinding -n rbac-lab
- Pod cannot access the Kubernetes API:
Check the ServiceAccount token is mounted and the pod identity is correct:
## Check if the pod uses the expected ServiceAccount
kubectl get pod api-explorer -n rbac-lab -o jsonpath='{.spec.serviceAccountName}'
echo
## Check if the ServiceAccount has the expected bindings
kubectl get rolebinding,clusterrolebinding -A -o wide | grep app-reader
- Aggregated ClusterRole not working:
Verify the label matches the aggregation label selector:
## Check the aggregation rule on the target ClusterRole
kubectl get clusterrole view -o yaml | grep -A5 aggregationRule
## The label must match exactly
kubectl get clusterrole custom-metrics-viewer -o yaml | grep -A2 labels
Next Steps¶
- Explore OPA Gatekeeper or Kyverno for policy enforcement beyond RBAC.
- Learn about Pod Security Standards and Pod Security Admission to control what pods can do.
- Set up Kubernetes Audit Logging to monitor RBAC events (who accessed what and when).
- Integrate RBAC with external identity providers (OIDC, LDAP) using Dex or your cloud provider’s IAM integration.
- Explore Hierarchical Namespaces for multi-tenant RBAC patterns.
- Study the RBAC Good Practices guide from the official Kubernetes documentation.
Kubernetes Secrets¶
- Welcome to the Kubernetes
Secretshands-on lab! In this tutorial, you’ll learn everything about Kubernetes Secrets – how to create, manage, consume, secure, and rotate them. - Secrets are first-class Kubernetes objects designed to hold sensitive data such as passwords, OAuth tokens, TLS certificates, and SSH keys.
- You’ll gain practical experience creating Secrets imperatively and declaratively, mounting them into Pods as environment variables and volumes, working with TLS and docker-registry Secrets, using projected volumes, making Secrets immutable, and enabling encryption at rest.
What will we learn?¶
- What Kubernetes Secrets are and why they exist
- The different types of Secrets and when to use each one
- How Secrets differ from ConfigMaps
- How to create Secrets imperatively (from literals, files, and env-files)
- How to create Secrets declaratively using YAML manifests
- How to mount Secrets as environment variables in Pods
- How to mount Secrets as files (volumes) in Pods
- How to create and use
docker-registrySecrets for private image registries - How to create and use TLS Secrets for HTTPS termination
- How to use projected volumes to combine Secrets and ConfigMaps
- How to make Secrets immutable for safety and performance
- How to rotate Secrets and trigger Pod restarts
- How to enable encryption at rest with
EncryptionConfiguration - Security best practices and common pitfalls
Official Documentation & References¶
| Resource | Link |
|---|---|
| Kubernetes Secrets | kubernetes.io/docs/concepts/configuration/secret |
| Managing Secrets with kubectl | kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kubectl |
| Managing Secrets with Config File | kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-config-file |
| Distribute Credentials via Secrets | kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure |
| Encrypting Secrets at Rest | kubernetes.io/docs/tasks/administer-cluster/encrypt-data |
| Good Practices for Secrets | kubernetes.io/docs/concepts/security/secrets-good-practices |
| Projected Volumes | kubernetes.io/docs/concepts/storage/projected-volumes |
| Pull Image from Private Registry | kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry |
| TLS Secrets | kubernetes.io/docs/concepts/configuration/secret/#tls-secrets |
| External Secrets Operator | external-secrets.io |
| Sealed Secrets (Bitnami) | github.com/bitnami-labs/sealed-secrets |
| HashiCorp Vault | vaultproject.io |
Introduction¶
What Are Kubernetes Secrets?¶
- A
Secretis a Kubernetes object that holds a small amount of sensitive data, such as a password, a token, or a key. - Secrets decouple sensitive information from Pod specs and container images, reducing the risk of accidental exposure.
- Without Secrets, you would need to embed credentials directly in Pod manifests, Dockerfiles, or application code – all of which are insecure practices.
- Secrets are stored in
etcd, the Kubernetes cluster’s key-value store, and are made available to Pods through environment variables or volume mounts.
Secret Types¶
Kubernetes supports several built-in Secret types, each designed for a specific use case:
| Type | Description | Usage |
|---|---|---|
Opaque |
Generic Secret for arbitrary key-value pairs (default type) | Passwords, API keys, connection strings |
kubernetes.io/dockerconfigjson |
Docker registry credentials for pulling private images | imagePullSecrets in Pod specs |
kubernetes.io/tls |
TLS certificate and private key pair | HTTPS termination, Ingress TLS |
kubernetes.io/basic-auth |
Credentials for basic HTTP authentication | Username/password for HTTP auth |
kubernetes.io/ssh-auth |
SSH private key for authentication | Git clones, SSH connections |
bootstrap.kubernetes.io/token |
Bootstrap token for node joining | kubeadm join operations |
kubernetes.io/service-account-token |
Service account token (auto-created by Kubernetes) | Pod-to-API-server authentication |
Default Type
If you do not specify a type when creating a Secret, Kubernetes defaults to Opaque. This is the most common type and accepts any arbitrary data.
Secrets vs. ConfigMaps¶
Both Secrets and ConfigMaps store configuration data, but they serve different purposes:
| Feature | Secret | ConfigMap |
|---|---|---|
| Purpose | Sensitive data (passwords, tokens, keys) | Non-sensitive configuration (settings, properties) |
| Data encoding | Values stored as base64-encoded strings | Values stored as plain text |
| Size limit | 1 MiB per Secret | 1 MiB per ConfigMap |
| RBAC | Typically restricted with fine-grained RBAC | Often more broadly accessible |
| tmpfs mounting | Mounted in tmpfs (RAM) – never written to disk on nodes |
Mounted on disk |
| Encryption | Can be encrypted at rest in etcd | Not encrypted at rest |
| Environment vars | Supported via secretKeyRef |
Supported via configMapKeyRef |
| Volume mounts | Supported (files in tmpfs) | Supported (files on disk) |
Base64 Is NOT Encryption
Kubernetes stores Secret values as base64-encoded strings. Base64 is an encoding scheme, not an encryption algorithm. Anyone with get access to Secrets can decode them trivially with base64 --decode. Always combine Secrets with proper RBAC and encryption at rest.
Secret Encoding: base64 vs. Encryption at Rest¶
Understanding the difference between encoding and encryption is critical:
- base64 encoding: Secrets stored in the
datafield must be base64-encoded. This is a reversible encoding (not encryption) that allows binary data to be represented as text. You can use thestringDatafield to provide plain-text values that Kubernetes will automatically base64-encode. - Encryption at rest: By default, Secrets are stored unencrypted in etcd. Anyone with access to etcd can read all Secrets. To protect Secrets in etcd, you must enable encryption at rest using an
EncryptionConfigurationresource and configure the API server to use it.
Security Best Practices¶
Critical Security Considerations
- Enable RBAC: Restrict who can
get,list, andwatchSecrets. A user who canlistSecrets in a namespace can see all Secret data. - Enable encryption at rest: Configure the API server with
--encryption-provider-configto encrypt Secrets in etcd. - Avoid Secrets in Git: Never commit Secret manifests with real credentials to version control. Use tools like Sealed Secrets, External Secrets Operator, or SOPS.
- Use least privilege: Grant only the minimum RBAC permissions needed. Avoid giving
*(wildcard) access to Secrets. - Rotate Secrets regularly: Establish a rotation schedule and automate the process.
- Prefer volume mounts over env vars: Environment variables can leak through crash dumps, logs, or child processes. Volume-mounted Secrets are more secure.
- Use immutable Secrets: Mark Secrets as
immutable: truewhen the data should never change, improving security and API server performance. - Audit Secret access: Enable Kubernetes audit logging to track who accesses Secrets and when.
Secret Lifecycle¶
The following diagram illustrates how Secrets flow from creation to consumption in a Kubernetes cluster:
flowchart TB
subgraph Creation ["Secret Creation"]
A["kubectl create secret"] --> D["API Server"]
B["YAML Manifest\n(stringData / data)"] --> D
C["External Secrets\nOperator / Vault"] --> D
end
subgraph Storage ["Storage Layer"]
D --> E{"Encryption\nat Rest?"}
E -->|Yes| F["Encrypted in etcd"]
E -->|No| G["Plain base64 in etcd"]
end
subgraph Consumption ["Pod Consumption"]
F --> H["kubelet fetches Secret"]
G --> H
H --> I["Environment\nVariables"]
H --> J["Volume Mounts\n(tmpfs)"]
H --> K["imagePullSecrets"]
end
subgraph Pod ["Pod Runtime"]
I --> L["Container Process\nreads env var"]
J --> M["Container Process\nreads file"]
K --> N["kubelet pulls\nprivate image"]
end
style Creation fill:#e1f5fe,stroke:#0277bd
style Storage fill:#fff3e0,stroke:#ef6c00
style Consumption fill:#e8f5e9,stroke:#2e7d32
style Pod fill:#f3e5f5,stroke:#7b1fa2
Prerequisites¶
- A running Kubernetes cluster (minikube, kind, k3d, Docker Desktop, or a cloud-managed cluster)
kubectlinstalled and configured to communicate with your clusteropensslinstalled (for generating TLS certificates in Step 06)- Basic familiarity with Kubernetes Pods, Deployments, and YAML manifests
Verify your cluster is accessible:
## Verify kubectl is configured and the cluster is reachable
kubectl cluster-info
## Verify you can list namespaces
kubectl get namespaces
Lab¶
Step 01 - Create the Lab Namespace¶
- Before we begin working with Secrets, let’s create a dedicated namespace to keep our lab resources isolated.
## Create the secrets-lab namespace from the manifest file
kubectl apply -f manifests/namespace.yaml
## Verify the namespace was created
kubectl get namespace secrets-lab
## Set the default namespace for this lab session so we don't need
## to add -n secrets-lab to every command
kubectl config set-context --current --namespace=secrets-lab
Using a Dedicated Namespace
Working in a dedicated namespace makes cleanup easy – just delete the namespace at the end and all resources are removed. It also provides an isolation boundary for RBAC policies.
Step 02 - Create Secrets Imperatively¶
- The fastest way to create Secrets is using
kubectl create secretdirectly from the command line. - Kubernetes supports creating Secrets from literal values, from files, and from environment files.
From Literal Values¶
## Create an Opaque Secret with literal key-value pairs
## The --from-literal flag accepts key=value pairs
kubectl create secret generic db-credentials \
--from-literal=username=admin \
--from-literal=password='S3cur3P@ssw0rd!' \
--from-literal=host=db.example.com \
--from-literal=port=5432
## Verify the Secret was created
kubectl get secret db-credentials
## View the Secret details (values are base64-encoded)
kubectl get secret db-credentials -o yaml
Quoting Special Characters
When using --from-literal, wrap values containing special characters ($, !, @, #, etc.) in single quotes to prevent shell interpretation. For example: --from-literal=password='My$ecret!'.
From Files¶
## Create sample credential files
echo -n "admin" > /tmp/username.txt
echo -n "S3cur3P@ssw0rd!" > /tmp/password.txt
## Create a Secret from files
## Each file becomes a key (filename) with the file contents as the value
kubectl create secret generic file-credentials \
--from-file=/tmp/username.txt \
--from-file=/tmp/password.txt
## Verify the Secret was created
kubectl get secret file-credentials -o yaml
## You can also specify a custom key name using key=path syntax
kubectl create secret generic file-credentials-custom \
--from-file=db-user=/tmp/username.txt \
--from-file=db-pass=/tmp/password.txt
## Verify the custom key names
kubectl get secret file-credentials-custom -o jsonpath='{.data}' | python3 -m json.tool
## Clean up temporary files
rm /tmp/username.txt /tmp/password.txt
Trailing Newlines
Use echo -n (without trailing newline) when creating files for Secrets. A trailing newline character will be included in the Secret value and can cause authentication failures.
From an Env File¶
## Create an env file with key=value pairs (one per line)
cat <<'EOF' > /tmp/app-secrets.env
DB_HOST=db.example.com
DB_PORT=5432
DB_USER=admin
DB_PASSWORD=S3cur3P@ssw0rd!
API_KEY=sk-1234567890abcdef
EOF
## Create a Secret from the env file
## Each line becomes a separate key-value pair in the Secret
kubectl create secret generic env-credentials \
--from-env-file=/tmp/app-secrets.env
## Verify the Secret keys match the env file entries
kubectl get secret env-credentials -o jsonpath='{.data}' | python3 -m json.tool
## Clean up
rm /tmp/app-secrets.env
Inspect the Created Secrets¶
## List all Secrets in the namespace
kubectl get secrets
## Describe a Secret to see metadata (values are hidden)
kubectl describe secret db-credentials
## Decode a specific value from a Secret
kubectl get secret db-credentials -o jsonpath='{.data.password}' | base64 --decode
echo ## Add a newline for readability
## Decode ALL values from a Secret using a one-liner
kubectl get secret db-credentials -o json | \
python3 -c "import json,sys,base64; \
data=json.load(sys.stdin)['data']; \
[print(f'{k}: {base64.b64decode(v).decode()}') for k,v in data.items()]"
Step 03 - Create Secrets Declaratively¶
- For production workflows, you typically define Secrets in YAML manifests and manage them through GitOps pipelines.
- Kubernetes supports two fields for providing Secret data:
data(base64-encoded) andstringData(plain text).
Using stringData (Recommended for Readability)¶
## Apply the Opaque Secret manifest that uses stringData
kubectl apply -f manifests/secret-opaque.yaml
## Verify the Secret was created
kubectl get secret app-credentials
## View the Secret -- notice that stringData values are now base64-encoded
## under the 'data' field (stringData is a write-only convenience field)
kubectl get secret app-credentials -o yaml
Using data (Base64-Encoded)¶
## When using the 'data' field, you must base64-encode values yourself
## Encode a value to base64
echo -n "my-secret-value" | base64
## Output: bXktc2VjcmV0LXZhbHVl
## Decode a base64 value back to plain text
echo "bXktc2VjcmV0LXZhbHVl" | base64 --decode
## Output: my-secret-value
## Example: Secret using the 'data' field with pre-encoded values
## (You do NOT need to apply this -- it is shown for comparison)
apiVersion: v1
kind: Secret
metadata:
name: base64-example
namespace: secrets-lab
type: Opaque
data:
## Each value must be base64-encoded
username: YWRtaW4= ## base64("admin")
password: UzNjdXIzUEBzc3cwcmQh ## base64("S3cur3P@ssw0rd!")
stringData vs. data
- Use
stringDatafor readability during development and when values are plain text. - Use
datawhen you have already-encoded values or when generating manifests programmatically. - If both
stringDataanddatacontain the same key, thestringDatavalue takes precedence. - After a Secret is created, Kubernetes always stores and returns values in the base64-encoded
datafield. ThestringDatafield does not appear inkubectl get secret -o yamloutput.
Verify the Data Is Base64-Encoded¶
## Retrieve the Secret and decode the username
kubectl get secret app-credentials -o jsonpath='{.data.username}' | base64 --decode
echo ## Newline
## Output: admin
## Retrieve and decode the password
kubectl get secret app-credentials -o jsonpath='{.data.password}' | base64 --decode
echo ## Newline
## Output: S3cur3P@ssw0rd!
## Retrieve and decode the database URL
kubectl get secret app-credentials -o jsonpath='{.data.database-url}' | base64 --decode
echo ## Newline
## Output: postgres://admin:S3cur3P%40ssw0rd!@db-host:5432/mydb
Step 04 - Mount Secrets as Environment Variables¶
- One of the most common ways to consume Secrets is by injecting them as environment variables into a container.
- The Pod references specific keys from a Secret using
secretKeyRef.
## Ensure the app-credentials Secret exists (from Step 03)
kubectl get secret app-credentials
## Apply the Pod that mounts Secret values as environment variables
kubectl apply -f manifests/pod-env-secret.yaml
## Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/secret-env-demo --timeout=60s
## View the Pod logs to see the environment variables in action
kubectl logs secret-env-demo
Expected output:
=== Secret values loaded as environment variables ===
DB_USERNAME=admin
DB_PASSWORD=S3cur3P@ssw0rd!
DATABASE_URL=postgres://admin:S3cur3P%40ssw0rd!@db-host:5432/mydb
=== Sleeping to keep pod alive for inspection ===
## You can also exec into the Pod and inspect the environment
kubectl exec secret-env-demo -- env | grep -E "DB_|DATABASE_"
## Verify a specific variable
kubectl exec secret-env-demo -- sh -c 'echo $DB_USERNAME'
Loading All Keys with envFrom¶
Instead of mapping individual keys, you can inject all keys from a Secret as environment variables:
## Create a Pod that loads all Secret keys as env vars using envFrom
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: secret-envfrom-demo
namespace: secrets-lab
spec:
containers:
- name: demo
image: busybox:1.36
command: ["sh", "-c", "env | sort | grep -v PATH && sleep 3600"]
envFrom:
- secretRef:
name: db-credentials
## Optional: add a prefix to all env var names
prefix: SECRET_
restartPolicy: Never
EOF
## Wait for the Pod and check the environment variables
kubectl wait --for=condition=Ready pod/secret-envfrom-demo --timeout=60s
kubectl logs secret-envfrom-demo | grep SECRET_
Environment Variable Risks
Environment variables are visible in process listings (/proc/<pid>/environ), crash dumps, and may be logged by applications. For highly sensitive data (private keys, TLS certificates), prefer volume mounts over environment variables.
Step 05 - Mount Secrets as Volumes¶
- Mounting Secrets as volumes creates files in the container’s filesystem.
- Each key in the Secret becomes a file, and the file content is the decoded Secret value.
- Volume-mounted Secrets are stored in
tmpfs(RAM-backed filesystem) and are never written to disk on the node.
## Ensure the app-credentials Secret exists
kubectl get secret app-credentials
## Apply the Pod that mounts Secret as a volume
kubectl apply -f manifests/pod-volume-secret.yaml
## Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/secret-volume-demo --timeout=60s
## View the Pod logs to see the mounted files
kubectl logs secret-volume-demo
Expected output:
=== Secret files mounted as volume ===
--- Listing /etc/secrets ---
total 0
drwxrwx--T 2 root root 120 ... .
drwxr-xr-x 1 root root 28 ... ..
-r-------- 1 root root 5 ... username
-r-------- 1 root root 16 ... password
-r-------- 1 root root 52 ... database-url
--- Contents of each file ---
/etc/secrets/database-url: postgres://admin:S3cur3P%40ssw0rd!@db-host:5432/mydb
/etc/secrets/password: S3cur3P@ssw0rd!
/etc/secrets/username: admin
## Exec into the Pod and read individual Secret files
kubectl exec secret-volume-demo -- cat /etc/secrets/username
kubectl exec secret-volume-demo -- cat /etc/secrets/password
## Verify file permissions (should be 0400 as set in the manifest)
kubectl exec secret-volume-demo -- ls -la /etc/secrets/
## Verify the mount is tmpfs (in-memory filesystem)
kubectl exec secret-volume-demo -- df -T /etc/secrets/
Mounting Specific Keys Only¶
You can select which keys to mount and control their file paths using the items field:
## Create a Pod that mounts only the password key with a custom filename
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: secret-selective-mount
namespace: secrets-lab
spec:
containers:
- name: demo
image: busybox:1.36
command: ["sh", "-c", "ls -la /etc/secrets/ && cat /etc/secrets/db-password && sleep 3600"]
volumeMounts:
- name: secret-vol
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secret-vol
secret:
secretName: app-credentials
items:
## Only mount the 'password' key, renaming the file to 'db-password'
- key: password
path: db-password
mode: 0400
restartPolicy: Never
EOF
## Wait and verify
kubectl wait --for=condition=Ready pod/secret-selective-mount --timeout=60s
kubectl logs secret-selective-mount
Automatic Updates
When a Secret is updated, volume-mounted Secrets are automatically updated by the kubelet (with a delay of up to the kubelet sync period, typically 1-2 minutes). Environment variables are NOT updated – the Pod must be restarted.
Step 06 - Create and Use Docker Registry Secrets¶
- When pulling container images from a private registry, Kubernetes needs credentials.
- The
docker-registrySecret type stores these credentials in the format that the kubelet expects.
## Create a docker-registry Secret
## Replace the placeholder values with your actual registry credentials
kubectl create secret docker-registry my-registry-secret \
--docker-server=https://index.docker.io/v1/ \
--docker-username=your-username \
--docker-password=your-password \
--docker-email=your-email@example.com
## View the created Secret
kubectl get secret my-registry-secret -o yaml
## The Secret contains a .dockerconfigjson key with the registry auth data
## Decode it to see the structure
kubectl get secret my-registry-secret \
-o jsonpath='{.data.\.dockerconfigjson}' | base64 --decode | python3 -m json.tool
Using imagePullSecrets in a Pod¶
## Create a Pod that references the registry Secret for image pulling
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: private-image-pod
namespace: secrets-lab
spec:
containers:
- name: app
image: your-registry.example.com/your-app:latest
ports:
- containerPort: 8080
imagePullSecrets:
- name: my-registry-secret
restartPolicy: Never
EOF
Attaching imagePullSecrets to a ServiceAccount¶
Instead of adding imagePullSecrets to every Pod, you can attach it to a ServiceAccount so all Pods using that ServiceAccount automatically use the registry credentials:
## Create a ServiceAccount with the imagePullSecret attached
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: private-registry-sa
namespace: secrets-lab
imagePullSecrets:
- name: my-registry-secret
EOF
## Any Pod using this ServiceAccount will automatically pull from the private registry
## kubectl get serviceaccount private-registry-sa -o yaml
ServiceAccount Pattern
Attaching imagePullSecrets to a ServiceAccount is the recommended pattern for teams. It avoids repeating the Secret reference in every Pod spec and centralizes registry credential management.
Step 07 - Create and Use TLS Secrets¶
- TLS Secrets hold a certificate and its associated private key.
- They are commonly used for HTTPS termination in Ingress controllers or directly in application Pods.
Generate a Self-Signed Certificate¶
## Generate a self-signed TLS certificate and private key using openssl
## This creates a certificate valid for 365 days for localhost
openssl req -x509 \
-nodes \
-days 365 \
-newkey rsa:2048 \
-keyout /tmp/tls.key \
-out /tmp/tls.crt \
-subj "/CN=localhost/O=secrets-lab"
## Verify the certificate details
openssl x509 -in /tmp/tls.crt -text -noout | head -20
Create the TLS Secret¶
## Create a TLS Secret from the certificate and key files
## kubectl validates that the cert and key are valid and match
kubectl create secret tls tls-secret \
--cert=/tmp/tls.crt \
--key=/tmp/tls.key
## Verify the Secret was created with the correct type
kubectl get secret tls-secret
## The TYPE column should show kubernetes.io/tls
## View the Secret structure
kubectl get secret tls-secret -o yaml
## It contains two keys: tls.crt and tls.key
## Clean up local files
rm /tmp/tls.crt /tmp/tls.key
Deploy Nginx with TLS¶
## Apply the nginx TLS pod and its ConfigMap
kubectl apply -f manifests/tls-pod.yaml
## Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/nginx-tls-demo --timeout=60s
## Test the HTTPS endpoint from inside the cluster
## The -k flag tells curl to accept the self-signed certificate
kubectl exec nginx-tls-demo -- curl -k https://localhost:443 2>/dev/null
## Output: TLS is working! Served by nginx with a self-signed certificate.
## Verify the certificate details served by nginx
kubectl exec nginx-tls-demo -- \
sh -c "echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -noout -subject -dates"
TLS in Ingress
In production, TLS Secrets are typically referenced in Ingress resources rather than mounted directly in Pods. Ingress controllers handle TLS termination centrally. Tools like cert-manager can automate certificate issuance and renewal.
Step 08 - Using Projected Volumes¶
- Projected volumes allow you to combine multiple volume sources into a single mount directory.
- This is useful when a container needs both Secrets and ConfigMaps accessible from the same path.
## Apply the projected volume demo (includes both a ConfigMap and a Pod)
kubectl apply -f manifests/pod-projected-volume.yaml
## Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/projected-volume-demo --timeout=60s
## View the Pod logs to see all projected files
kubectl logs projected-volume-demo
Expected output:
=== Projected volume contents ===
--- Listing /etc/app-config ---
...
--- Secret: username ---
admin
--- Secret: password ---
S3cur3P@ssw0rd!
--- ConfigMap: app.properties ---
app.name=secrets-demo
app.version=1.0.0
app.environment=development
app.log.level=info
--- ConfigMap: feature-flags.json ---
{
"enableNewUI": true,
"enableBetaFeatures": false,
"maxRetries": 3
}
=== All config and secrets unified under one mount ===
## Exec into the Pod and explore the projected mount
kubectl exec projected-volume-demo -- ls -la /etc/app-config/
## Read a specific file
kubectl exec projected-volume-demo -- cat /etc/app-config/app.properties
Projected Volume Sources
Projected volumes can combine: secret, configMap, downwardAPI, and serviceAccountToken sources. This is powerful for applications that expect all configuration in a single directory.
Step 09 - Immutable Secrets¶
- Kubernetes supports marking Secrets as immutable starting from v1.21 (GA).
- Once a Secret is marked as
immutable: true, itsdataandstringDatafields cannot be updated. - The only way to change an immutable Secret is to delete it and recreate it.
Benefits of Immutable Secrets¶
- Security: Prevents accidental or malicious modifications to critical credentials
- Performance: The kubelet does not need to set up watches for immutable Secrets, reducing API server load
- Reliability: Guarantees that the Secret data remains consistent throughout its lifetime
## Apply the immutable Secret
kubectl apply -f manifests/immutable-secret.yaml
## Verify it was created
kubectl get secret immutable-api-key
kubectl get secret immutable-api-key -o jsonpath='{.immutable}'
echo ## Newline
## Output: true
Attempt to Modify an Immutable Secret¶
## Try to update the immutable Secret -- this will FAIL
kubectl patch secret immutable-api-key \
--type='json' \
-p='[{"op": "replace", "path": "/data/api-key", "value": "bmV3LWtleQ=="}]'
## Expected error:
## Error from server (Forbidden): secrets "immutable-api-key" is forbidden:
## field is immutable when `immutable` is set
## Also try editing -- this will also fail on save
## kubectl edit secret immutable-api-key
## (Uncomment and try if you want to see the error)
## The ONLY way to change an immutable Secret is to delete and recreate it
kubectl delete secret immutable-api-key
kubectl apply -f manifests/immutable-secret.yaml
Immutable Cannot Be Reversed
Once immutable: true is set on a Secret, you cannot change it back to false. The only option is to delete and recreate the Secret. Plan accordingly before marking Secrets as immutable.
Step 10 - Secret Rotation and Pod Restart Strategies¶
- In production, credentials must be rotated regularly for security compliance.
- When a Secret is updated, volume-mounted Secrets are automatically refreshed, but environment variables are not.
- For environment variable-based Secrets, you need to trigger a Pod restart.
Update a Secret¶
## Update the db-credentials Secret with a new password
kubectl create secret generic db-credentials \
--from-literal=username=admin \
--from-literal=password='N3wR0t@tedP@ss!' \
--from-literal=host=db.example.com \
--from-literal=port=5432 \
--dry-run=client -o yaml | kubectl apply -f -
## Verify the password was updated
kubectl get secret db-credentials -o jsonpath='{.data.password}' | base64 --decode
echo ## Newline
## Output: N3wR0t@tedP@ss!
Verify Volume Mount Auto-Update¶
## If you still have the secret-volume-demo Pod running from Step 05,
## check that the mounted files reflect the new Secret data.
## (There may be a delay of up to the kubelet sync period, typically 1-2 minutes)
## Wait a moment for the kubelet to sync
sleep 120
## Check the updated file content
kubectl exec secret-volume-demo -- cat /etc/secrets/password 2>/dev/null || \
echo "Pod secret-volume-demo is not running. Recreate it to test volume auto-update."
Trigger a Rolling Restart for Deployments¶
When Pods consume Secrets as environment variables, updating the Secret does not restart the Pod. Use one of these strategies:
## Strategy 1: kubectl rollout restart (simplest, requires Deployment/StatefulSet)
## Create a sample Deployment first
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: secret-consumer
namespace: secrets-lab
spec:
replicas: 2
selector:
matchLabels:
app: secret-consumer
template:
metadata:
labels:
app: secret-consumer
spec:
containers:
- name: app
image: busybox:1.36
command: ["sh", "-c", "while true; do echo DB_PASSWORD=$DB_PASSWORD; sleep 30; done"]
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
EOF
## Wait for the Deployment to be ready
kubectl rollout status deployment/secret-consumer --timeout=60s
## Now trigger a rolling restart to pick up the updated Secret
kubectl rollout restart deployment/secret-consumer
## Wait for the rollout to complete
kubectl rollout status deployment/secret-consumer --timeout=60s
## Verify the new pods have the updated password
kubectl logs deployment/secret-consumer | head -5
## Strategy 2: Annotation-based trigger (useful in Helm/GitOps workflows)
## Add a hash of the Secret data as a pod annotation.
## When the Secret changes, the annotation changes, triggering a rollout.
SECRET_HASH=$(kubectl get secret db-credentials -o jsonpath='{.data}' | sha256sum | cut -d' ' -f1)
kubectl patch deployment secret-consumer \
-p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"secret-hash\":\"$SECRET_HASH\"}}}}}"
## Verify the annotation was added
kubectl get deployment secret-consumer -o jsonpath='{.spec.template.metadata.annotations}' | python3 -m json.tool
## Strategy 3: Automated rotation script
## This script updates a Secret and triggers a rolling restart
cat <<'SCRIPT'
#!/bin/bash
## rotate-secret.sh - Rotate a Secret and restart the consuming Deployment
SECRET_NAME="${1:?Usage: rotate-secret.sh <secret-name> <deployment-name>}"
DEPLOYMENT_NAME="${2:?Usage: rotate-secret.sh <secret-name> <deployment-name>}"
NAMESPACE="${3:-secrets-lab}"
## Generate a new password
NEW_PASSWORD=$(openssl rand -base64 24)
## Update the Secret
kubectl create secret generic "$SECRET_NAME" \
--from-literal=username=admin \
--from-literal=password="$NEW_PASSWORD" \
--from-literal=host=db.example.com \
--from-literal=port=5432 \
--namespace="$NAMESPACE" \
--dry-run=client -o yaml | kubectl apply -f -
echo "Secret '$SECRET_NAME' updated with new password."
## Trigger rolling restart
kubectl rollout restart deployment/"$DEPLOYMENT_NAME" -n "$NAMESPACE"
echo "Rolling restart triggered for deployment '$DEPLOYMENT_NAME'."
## Wait for rollout
kubectl rollout status deployment/"$DEPLOYMENT_NAME" -n "$NAMESPACE" --timeout=120s
echo "Rotation complete."
SCRIPT
Step 11 - Enable Encryption at Rest¶
- By default, Secrets are stored unencrypted in etcd.
- Encryption at rest ensures that Secrets are encrypted before being written to etcd.
- This step requires access to the control-plane node and the ability to modify kube-apiserver flags.
Control Plane Access Required
This step can only be performed on clusters where you have control-plane access (e.g., kubeadm clusters, bare-metal). Managed Kubernetes services (EKS, GKE, AKS) typically handle encryption at rest automatically or through their own configuration mechanisms.
Review the EncryptionConfiguration¶
## The EncryptionConfiguration specifies:
## - Which resources to encrypt (secrets)
## - Which encryption provider to use (aescbc)
## - The encryption key (base64-encoded 32-byte key)
## - A fallback identity provider (for reading unencrypted data)
Generate an Encryption Key¶
## Generate a random 32-byte encryption key and base64-encode it
head -c 32 /dev/urandom | base64
## Use this output as the 'secret' value in the EncryptionConfiguration
Apply Encryption (kubeadm clusters)¶
Cluster-Specific Instructions
The following steps apply to kubeadm-based clusters. For managed Kubernetes services, consult your provider’s documentation.
## 1. Copy the EncryptionConfiguration to the control-plane node
## (Replace the key in the file with your generated key first!)
## sudo cp encryption-config.yaml /etc/kubernetes/encryption-config.yaml
## 2. Edit the kube-apiserver manifest to add the encryption flag
## sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
## Add under spec.containers.command:
## - --encryption-provider-config=/etc/kubernetes/encryption-config.yaml
## Add under spec.containers.volumeMounts:
## - name: encryption-config
## mountPath: /etc/kubernetes/encryption-config.yaml
## readOnly: true
## Add under spec.volumes:
## - name: encryption-config
## hostPath:
## path: /etc/kubernetes/encryption-config.yaml
## type: File
## 3. The kube-apiserver will restart automatically (it's a static Pod)
## Wait for it to come back up
## kubectl get pods -n kube-system -l component=kube-apiserver
## 4. Re-encrypt all existing Secrets so they are stored encrypted
## kubectl get secrets --all-namespaces -o json | kubectl replace -f -
## 5. Verify encryption is working by checking etcd directly
## ETCDCTL_API=3 etcdctl get /registry/secrets/default/my-secret \
## --cacert=/etc/kubernetes/pki/etcd/ca.crt \
## --cert=/etc/kubernetes/pki/etcd/server.crt \
## --key=/etc/kubernetes/pki/etcd/server.key
## The output should show encrypted (non-readable) data prefixed with "k8s:enc:aescbc:v1:key1"
Managed Kubernetes Encryption
- GKE: Secrets are encrypted at rest by default. You can use customer-managed encryption keys (CMEK) via Cloud KMS.
- EKS: Enable envelope encryption with AWS KMS through the
--encryption-configflag in the cluster config. - AKS: Enable encryption at rest with customer-managed keys through Azure Key Vault.
Exercises¶
The following exercises will test your understanding of Kubernetes Secrets. Try to solve each exercise on your own before revealing the solution.
01. Create a Secret from Multiple Files and Verify Its Content¶
Create a Secret named multi-file-secret from three separate files containing a username, password, and API token. Then verify you can retrieve and decode each value.
Scenario:¶
β¦ You have received three credential files from your security team. β¦ Each file contains a single credential value that must be stored in Kubernetes. β¦ You need to create a single Secret containing all three credentials.
Hint: Use kubectl create secret generic --from-file with multiple --from-file flags or a directory.
Solution
## Create temporary credential files
echo -n "lab-admin" > /tmp/username
echo -n "Ex3rc1se#1!" > /tmp/password
echo -n "tok-abc123def456" > /tmp/api-token
## Create the Secret from multiple files
kubectl create secret generic multi-file-secret \
--from-file=/tmp/username \
--from-file=/tmp/password \
--from-file=/tmp/api-token
## Verify the Secret exists and has 3 data entries
kubectl get secret multi-file-secret
## DATA column should show 3
## Decode each value to verify correctness
kubectl get secret multi-file-secret -o jsonpath='{.data.username}' | base64 --decode
echo ## Newline -- Output: lab-admin
kubectl get secret multi-file-secret -o jsonpath='{.data.password}' | base64 --decode
echo ## Newline -- Output: Ex3rc1se#1!
kubectl get secret multi-file-secret -o jsonpath='{.data.api-token}' | base64 --decode
echo ## Newline -- Output: tok-abc123def456
## Clean up
rm /tmp/username /tmp/password /tmp/api-token
kubectl delete secret multi-file-secret
02. Mount a Secret as an Environment Variable and Prove the Pod Can Read It¶
Create a Secret named greeting-secret with a key message containing the value Hello from Kubernetes Secrets!. Then create a Pod that reads this value as an environment variable and prints it.
Scenario:¶
β¦ Your application reads its greeting message from an environment variable named GREETING.
β¦ The message is sensitive (perhaps it contains an internal URL) and should be stored in a Secret.
β¦ You need to verify the application can read the value correctly.
Hint: Use secretKeyRef in the Pod’s env section and kubectl logs to verify.
Solution
## Create the Secret with the greeting message
kubectl create secret generic greeting-secret \
--from-literal=message="Hello from Kubernetes Secrets!"
## Create a Pod that reads the Secret as an environment variable
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: greeting-pod
namespace: secrets-lab
spec:
containers:
- name: greeter
image: busybox:1.36
command: ["sh", "-c", "echo $GREETING"]
env:
- name: GREETING
valueFrom:
secretKeyRef:
name: greeting-secret
key: message
restartPolicy: Never
EOF
## Wait for the Pod to complete
kubectl wait --for=condition=Ready pod/greeting-pod --timeout=30s 2>/dev/null || sleep 5
## Check the logs -- should print the greeting
kubectl logs greeting-pod
## Output: Hello from Kubernetes Secrets!
## Clean up
kubectl delete pod greeting-pod
kubectl delete secret greeting-secret
03. Create a Secret with stringData and Verify It Gets Base64-Encoded¶
Create a Secret declaratively using the stringData field with a username of developer and a password of PlainText123. After applying, verify that the values are stored as base64 in the data field.
Scenario:¶
β¦ You want to create a Secret using plain-text values for convenience. β¦ You need to confirm that Kubernetes correctly encodes the values before storing them. β¦ Understanding this encoding behavior is essential for debugging Secret issues.
Hint: Use kubectl apply with a YAML manifest containing stringData, then inspect with kubectl get secret -o yaml.
Solution
## Create the Secret using stringData
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: stringdata-demo
namespace: secrets-lab
type: Opaque
stringData:
username: developer
password: PlainText123
EOF
## View the Secret -- notice stringData is converted to data with base64
kubectl get secret stringdata-demo -o yaml
## Manually verify the base64 encoding
echo -n "developer" | base64
## Output: ZGV2ZWxvcGVy
echo -n "PlainText123" | base64
## Output: UGxhaW5UZXh0MTIz
## Compare with what Kubernetes stored
kubectl get secret stringdata-demo -o jsonpath='{.data.username}'
echo ## Should output: ZGV2ZWxvcGVy
kubectl get secret stringdata-demo -o jsonpath='{.data.password}'
echo ## Should output: UGxhaW5UZXh0MTIz
## Decode to confirm round-trip integrity
kubectl get secret stringdata-demo -o jsonpath='{.data.username}' | base64 --decode
echo ## Output: developer
kubectl get secret stringdata-demo -o jsonpath='{.data.password}' | base64 --decode
echo ## Output: PlainText123
## Clean up
kubectl delete secret stringdata-demo
04. Mount a Secret as a Volume and Read the File Inside the Pod¶
Create a Secret named config-secret with two keys: db.conf containing host=db.local\nport=5432 and cache.conf containing host=redis.local\nport=6379. Mount it as a volume at /etc/config and read both files from inside the Pod.
Scenario:¶
β¦ Your application reads configuration from files in /etc/config/.
β¦ Some configuration values are sensitive (database and cache connection details).
β¦ You need the files to appear as regular files that the application can read.
Hint: Create the Secret with --from-literal, mount as a volume with secret.secretName, and use kubectl exec to read files.
Solution
## Create the Secret with configuration data
kubectl create secret generic config-secret \
--from-literal=db.conf=$'host=db.local\nport=5432' \
--from-literal=cache.conf=$'host=redis.local\nport=6379'
## Create a Pod that mounts the Secret as a volume
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: config-reader
namespace: secrets-lab
spec:
containers:
- name: reader
image: busybox:1.36
command: ["sh", "-c", "sleep 3600"]
volumeMounts:
- name: config-vol
mountPath: /etc/config
readOnly: true
volumes:
- name: config-vol
secret:
secretName: config-secret
restartPolicy: Never
EOF
## Wait for the Pod
kubectl wait --for=condition=Ready pod/config-reader --timeout=60s
## List the files in the mounted directory
kubectl exec config-reader -- ls -la /etc/config/
## Read the database configuration
kubectl exec config-reader -- cat /etc/config/db.conf
## Output:
## host=db.local
## port=5432
## Read the cache configuration
kubectl exec config-reader -- cat /etc/config/cache.conf
## Output:
## host=redis.local
## port=6379
## Clean up
kubectl delete pod config-reader
kubectl delete secret config-secret
05. Create a TLS Secret and Mount It in an Nginx Pod¶
Generate a self-signed TLS certificate, create a kubernetes.io/tls Secret, and deploy an nginx Pod that serves HTTPS traffic using the certificate.
Scenario:¶
β¦ Your team needs to test HTTPS termination at the Pod level. β¦ You must generate a self-signed certificate, store it as a Kubernetes TLS Secret, and verify nginx can serve HTTPS responses.
Hint: Use openssl req to generate the cert/key, kubectl create secret tls to create the Secret, and kubectl exec with curl -k to test.
Solution
## Generate a self-signed certificate
openssl req -x509 -nodes -days 30 \
-newkey rsa:2048 \
-keyout /tmp/exercise-tls.key \
-out /tmp/exercise-tls.crt \
-subj "/CN=exercise.local/O=exercise"
## Create the TLS Secret
kubectl create secret tls exercise-tls-secret \
--cert=/tmp/exercise-tls.crt \
--key=/tmp/exercise-tls.key
## Verify the Secret type
kubectl get secret exercise-tls-secret
## TYPE should be kubernetes.io/tls
## Create an nginx ConfigMap for HTTPS
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: exercise-nginx-conf
namespace: secrets-lab
data:
default.conf: |
server {
listen 443 ssl;
ssl_certificate /etc/nginx/ssl/tls.crt;
ssl_certificate_key /etc/nginx/ssl/tls.key;
location / {
return 200 'Exercise 05: TLS is working!\n';
add_header Content-Type text/plain;
}
}
EOF
## Deploy nginx with the TLS Secret
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: exercise-nginx-tls
namespace: secrets-lab
spec:
containers:
- name: nginx
image: nginx:1.25-alpine
ports:
- containerPort: 443
volumeMounts:
- name: tls-certs
mountPath: /etc/nginx/ssl
readOnly: true
- name: nginx-config
mountPath: /etc/nginx/conf.d
readOnly: true
volumes:
- name: tls-certs
secret:
secretName: exercise-tls-secret
- name: nginx-config
configMap:
name: exercise-nginx-conf
restartPolicy: Never
EOF
## Wait for the Pod
kubectl wait --for=condition=Ready pod/exercise-nginx-tls --timeout=60s
## Test HTTPS
kubectl exec exercise-nginx-tls -- curl -k https://localhost:443 2>/dev/null
## Output: Exercise 05: TLS is working!
## Clean up
rm /tmp/exercise-tls.crt /tmp/exercise-tls.key
kubectl delete pod exercise-nginx-tls
kubectl delete configmap exercise-nginx-conf
kubectl delete secret exercise-tls-secret
06. Create an imagePullSecret and Reference It in a Deployment¶
Create a docker-registry Secret for a fictional private registry and create a Deployment that references it via imagePullSecrets.
Scenario:¶
β¦ Your organization uses a private container registry at registry.internal.example.com.
β¦ All Pods pulling from this registry need credentials.
β¦ You need to set up the credentials and reference them in a Deployment.
Hint: Use kubectl create secret docker-registry and add imagePullSecrets to the Deployment’s Pod template spec.
Solution
## Create the docker-registry Secret
kubectl create secret docker-registry internal-registry \
--docker-server=registry.internal.example.com \
--docker-username=deploy-bot \
--docker-password='R3g1stryP@ss!' \
--docker-email=deploy@example.com
## Verify the Secret
kubectl get secret internal-registry
## TYPE should be kubernetes.io/dockerconfigjson
## Create a Deployment referencing the imagePullSecret
## (This will fail to pull since the registry is fictional, but the
## configuration is correct -- check the Pod spec for the reference)
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: private-app
namespace: secrets-lab
spec:
replicas: 1
selector:
matchLabels:
app: private-app
template:
metadata:
labels:
app: private-app
spec:
containers:
- name: app
## Using nginx as a stand-in; in production this would be
## registry.internal.example.com/my-app:latest
image: nginx:1.25-alpine
ports:
- containerPort: 80
imagePullSecrets:
- name: internal-registry
EOF
## Verify the Deployment references the imagePullSecret
kubectl get deployment private-app -o jsonpath='{.spec.template.spec.imagePullSecrets}' | python3 -m json.tool
## Clean up
kubectl delete deployment private-app
kubectl delete secret internal-registry
07. Update a Secret and Verify the Volume Mount Reflects the Change¶
Create a Secret, mount it as a volume in a Pod, update the Secret, and verify that the file content inside the Pod changes automatically.
Scenario:¶
β¦ You need to update an API key without restarting the application. β¦ Volume-mounted Secrets are automatically refreshed by the kubelet. β¦ You need to prove this auto-update behavior works.
Hint: Create a Secret and a long-running Pod with a volume mount. Use kubectl create --dry-run=client -o yaml | kubectl apply -f - to update the Secret, then wait and re-read the file.
Solution
## Create the initial Secret
kubectl create secret generic rotating-secret \
--from-literal=api-key="original-key-v1"
## Create a long-running Pod that mounts the Secret
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: secret-watcher
namespace: secrets-lab
spec:
containers:
- name: watcher
image: busybox:1.36
command: ["sh", "-c", "while true; do echo \"$(date): $(cat /etc/secrets/api-key)\"; sleep 10; done"]
volumeMounts:
- name: secret-vol
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secret-vol
secret:
secretName: rotating-secret
restartPolicy: Never
EOF
## Wait for the Pod
kubectl wait --for=condition=Ready pod/secret-watcher --timeout=60s
## Verify the current value
kubectl exec secret-watcher -- cat /etc/secrets/api-key
## Output: original-key-v1
## Update the Secret with a new value
kubectl create secret generic rotating-secret \
--from-literal=api-key="rotated-key-v2" \
--dry-run=client -o yaml | kubectl apply -f -
## Wait for the kubelet to sync (up to ~2 minutes)
echo "Waiting for kubelet to sync the updated Secret..."
sleep 120
## Verify the value has been updated inside the Pod
kubectl exec secret-watcher -- cat /etc/secrets/api-key
## Output: rotated-key-v2
## Check the logs for the transition
kubectl logs secret-watcher | tail -20
## Clean up
kubectl delete pod secret-watcher
kubectl delete secret rotating-secret
08. Create an Immutable Secret and Try to Modify It¶
Create a Secret marked as immutable: true and attempt to update its data. Verify that the update is rejected by the API server.
Scenario:¶
β¦ Your compliance team requires that production API keys cannot be modified after deployment. β¦ You need to prove that the immutable flag prevents changes. β¦ You also need to understand the only way to change an immutable Secret.
Hint: Create a Secret with immutable: true in the YAML, then try kubectl patch or kubectl edit to modify it.
Solution
## Create an immutable Secret
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: locked-secret
namespace: secrets-lab
type: Opaque
immutable: true
stringData:
token: "immutable-token-abc123"
EOF
## Verify it's immutable
kubectl get secret locked-secret -o jsonpath='{.immutable}'
echo ## Output: true
## Attempt to modify the Secret data -- this WILL FAIL
kubectl patch secret locked-secret \
--type='json' \
-p='[{"op": "replace", "path": "/stringData", "value": {"token": "new-token"}}]' 2>&1 || true
## Expected error: field is immutable when `immutable` is set
## Attempt to add a new key -- this WILL also FAIL
kubectl patch secret locked-secret \
--type='merge' \
-p='{"stringData": {"new-key": "new-value"}}' 2>&1 || true
## The ONLY way to change it is delete + recreate
kubectl delete secret locked-secret
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: locked-secret
namespace: secrets-lab
type: Opaque
immutable: true
stringData:
token: "new-immutable-token-xyz789"
EOF
## Verify the new value
kubectl get secret locked-secret -o jsonpath='{.data.token}' | base64 --decode
echo ## Output: new-immutable-token-xyz789
## Clean up
kubectl delete secret locked-secret
09. Use Projected Volumes to Combine a Secret and a ConfigMap¶
Create a Secret and a ConfigMap, then mount both into a single directory in a Pod using a projected volume. Verify all files appear under the same mount path.
Scenario:¶
β¦ Your application expects all configuration (sensitive and non-sensitive) in /app/config/.
β¦ You cannot have two separate mount paths – the application reads from a single directory.
β¦ You need to combine a Secret and a ConfigMap into one unified mount.
Hint: Use a projected volume with sources containing both secret and configMap entries.
Solution
## Create the Secret
kubectl create secret generic app-secret \
--from-literal=db-password="pr0j3ct3d!"
## Create the ConfigMap
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: app-settings
namespace: secrets-lab
data:
app.env: "production"
log.level: "warn"
EOF
## Create a Pod with a projected volume combining both
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: projected-demo
namespace: secrets-lab
spec:
containers:
- name: app
image: busybox:1.36
command: ["sh", "-c", "ls -la /app/config/ && echo '---' && for f in /app/config/*; do echo \"$f: $(cat $f)\"; done && sleep 3600"]
volumeMounts:
- name: all-config
mountPath: /app/config
readOnly: true
volumes:
- name: all-config
projected:
sources:
- secret:
name: app-secret
items:
- key: db-password
path: db-password
- configMap:
name: app-settings
items:
- key: app.env
path: app.env
- key: log.level
path: log.level
restartPolicy: Never
EOF
## Wait and check the logs
kubectl wait --for=condition=Ready pod/projected-demo --timeout=60s
kubectl logs projected-demo
## Expected output shows all three files under /app/config/:
## db-password (from Secret)
## app.env (from ConfigMap)
## log.level (from ConfigMap)
## Verify individual files
kubectl exec projected-demo -- cat /app/config/db-password
## Output: pr0j3ct3d!
kubectl exec projected-demo -- cat /app/config/app.env
## Output: production
## Clean up
kubectl delete pod projected-demo
kubectl delete secret app-secret
kubectl delete configmap app-settings
10. Decode All Values in a Secret Using a One-Liner¶
Given a Secret named multi-key-secret with several keys, write a single command that decodes and displays all key-value pairs.
Scenario:¶
β¦ You are debugging an application and need to quickly inspect all values in a Secret.
β¦ Running separate jsonpath + base64 --decode for each key is tedious.
β¦ You need a single command that outputs all decoded key-value pairs.
Hint: Use kubectl get secret -o json piped to a tool that iterates over the data map and decodes each value.
Solution
## Create a Secret with multiple keys for testing
kubectl create secret generic multi-key-secret \
--from-literal=key1="value-one" \
--from-literal=key2="value-two" \
--from-literal=key3="value-three" \
--from-literal=api-token="tok-abc123"
## Method 1: Using python3 (available on most systems)
kubectl get secret multi-key-secret -o json | \
python3 -c "
import json, sys, base64
data = json.load(sys.stdin)['data']
for k, v in sorted(data.items()):
print(f'{k}: {base64.b64decode(v).decode()}')
"
## Method 2: Using kubectl jsonpath + bash loop
for key in $(kubectl get secret multi-key-secret -o jsonpath='{.data}' | python3 -c "import json,sys; [print(k) for k in json.load(sys.stdin)]"); do
value=$(kubectl get secret multi-key-secret -o jsonpath="{.data.$key}" | base64 --decode)
echo "$key: $value"
done
## Method 3: Using go-template
kubectl get secret multi-key-secret -o go-template='{{range $k, $v := .data}}{{$k}}: {{$v | base64decode}}
{{end}}'
## Expected output (any method):
## api-token: tok-abc123
## key1: value-one
## key2: value-two
## key3: value-three
## Clean up
kubectl delete secret multi-key-secret
11. Create a Secret with Special Characters in Values¶
Create a Secret containing values with special characters: backslashes, quotes, dollar signs, newlines, and unicode characters. Verify they survive the encoding/decoding round-trip.
Scenario:¶
β¦ A database password generated by your security tool contains complex special characters. β¦ You need to ensure these characters are not corrupted during Secret creation and retrieval. β¦ Shell escaping and base64 encoding can sometimes mangle special characters.
Hint: Use --from-file with a file containing the exact value to avoid shell escaping issues, or use stringData in a YAML manifest.
Solution
## Method 1: Using --from-file to avoid shell escaping entirely
printf 'P@$$w0rd\n"quotes"\tand\\backslash' > /tmp/special-chars.txt
kubectl create secret generic special-secret \
--from-file=complex-value=/tmp/special-chars.txt
## Verify the round-trip
kubectl get secret special-secret -o jsonpath='{.data.complex-value}' | base64 --decode
## Output should exactly match the original value
## Method 2: Using stringData in a YAML manifest for more control
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: special-yaml-secret
namespace: secrets-lab
type: Opaque
stringData:
password: 'P@$$w0rd!"quotes"&<xml>'
multiline: |
line1: value1
line2: value2
line3: "quoted value"
json-config: '{"key": "value with \"quotes\" and $pecial chars"}'
EOF
## Verify each key
kubectl get secret special-yaml-secret -o jsonpath='{.data.password}' | base64 --decode
echo
## Output: P@$$w0rd!"quotes"&<xml>
kubectl get secret special-yaml-secret -o jsonpath='{.data.multiline}' | base64 --decode
## Output should preserve the multi-line format
kubectl get secret special-yaml-secret -o jsonpath='{.data.json-config}' | base64 --decode
echo
## Output: {"key": "value with \"quotes\" and $pecial chars"}
## Clean up
rm /tmp/special-chars.txt
kubectl delete secret special-secret
kubectl delete secret special-yaml-secret
12. Write a Script That Rotates a Secret and Triggers a Rolling Restart¶
Write a shell script that generates a new random password, updates a Secret, and triggers a rolling restart of the consuming Deployment. Verify the Pods pick up the new credential.
Scenario:¶
β¦ Your security policy requires rotating database passwords every 30 days. β¦ The rotation must be automated and must not cause downtime. β¦ After rotation, all Pods must pick up the new password via a rolling restart.
Hint: Use openssl rand to generate a password, kubectl create secret --dry-run=client -o yaml | kubectl apply -f - to update, and kubectl rollout restart to trigger the restart.
Solution
## Create the initial Secret and Deployment
kubectl create secret generic rotatable-db-creds \
--from-literal=username=app-user \
--from-literal=password="initial-password-v1"
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: db-consumer
namespace: secrets-lab
spec:
replicas: 2
selector:
matchLabels:
app: db-consumer
template:
metadata:
labels:
app: db-consumer
spec:
containers:
- name: app
image: busybox:1.36
command: ["sh", "-c", "while true; do echo \"password=$DB_PASSWORD\"; sleep 30; done"]
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: rotatable-db-creds
key: password
EOF
## Wait for the Deployment to be ready
kubectl rollout status deployment/db-consumer --timeout=60s
## Verify the current password
kubectl logs deployment/db-consumer | head -1
## Output: password=initial-password-v1
## --- The rotation script ---
## Generate a new random password
NEW_PASSWORD=$(openssl rand -base64 24)
echo "New password: $NEW_PASSWORD"
## Update the Secret using the dry-run + apply pattern
kubectl create secret generic rotatable-db-creds \
--from-literal=username=app-user \
--from-literal=password="$NEW_PASSWORD" \
--dry-run=client -o yaml | kubectl apply -f -
## Verify the Secret was updated
kubectl get secret rotatable-db-creds -o jsonpath='{.data.password}' | base64 --decode
echo ## Should match $NEW_PASSWORD
## Trigger a rolling restart so Pods pick up the new env var value
kubectl rollout restart deployment/db-consumer
## Wait for the rollout to complete
kubectl rollout status deployment/db-consumer --timeout=120s
## Verify the new Pods have the updated password
sleep 5
kubectl logs deployment/db-consumer | head -1
## Output: password=<new-random-password>
## Clean up
kubectl delete deployment db-consumer
kubectl delete secret rotatable-db-creds
Finalize & Cleanup¶
- To remove all resources created by this lab, delete the
secrets-labnamespace:
## Delete the entire namespace (this removes ALL resources within it)
kubectl delete namespace secrets-lab
- Reset your kubectl context to the default namespace:
- (Optional) Remove any local temporary files created during the lab:
## Clean up any leftover temporary files
rm -f /tmp/username.txt /tmp/password.txt /tmp/app-secrets.env
rm -f /tmp/tls.crt /tmp/tls.key
rm -f /tmp/exercise-tls.crt /tmp/exercise-tls.key
rm -f /tmp/special-chars.txt
Troubleshooting¶
- Secret not found:
Verify the Secret exists in the correct namespace:
## List Secrets in the current namespace
kubectl get secrets -n secrets-lab
## Check if you are in the right namespace
kubectl config view --minify -o jsonpath='{.contexts[0].context.namespace}'
- Base64 decode errors:
Ensure values were encoded without trailing newlines:
## Correct: no trailing newline
echo -n "myvalue" | base64
## Incorrect: includes trailing newline (will cause issues)
echo "myvalue" | base64
- Pod cannot access Secret:
Check RBAC permissions and ensure the Secret name matches:
## Verify the Secret exists
kubectl get secret <secret-name> -n secrets-lab
## Check Pod events for errors
kubectl describe pod <pod-name> -n secrets-lab | grep -A10 "Events:"
## Common error: "secret not found" -- verify the name matches exactly
kubectl get pod <pod-name> -n secrets-lab -o yaml | grep secretName
- Volume-mounted Secret not updating:
The kubelet syncs Secret updates with a configurable period (default: up to ~1 minute plus cache propagation delay). If using subPath mounts, the file is never updated:
## Check kubelet sync period (on the node)
## Default is --sync-frequency=1m
## IMPORTANT: subPath volume mounts do NOT receive automatic updates
## Avoid using subPath with Secrets if you need auto-rotation
- Immutable Secret cannot be updated:
This is expected behavior. Delete and recreate the Secret:
## Delete the immutable Secret
kubectl delete secret <secret-name> -n secrets-lab
## Recreate it with updated values
kubectl apply -f <manifest.yaml>
- TLS Secret creation fails:
Ensure the certificate and key files are valid and match:
## Verify the certificate
openssl x509 -in /tmp/tls.crt -text -noout
## Verify the key
openssl rsa -in /tmp/tls.key -check
## Verify the cert and key match (modulus should be identical)
openssl x509 -noout -modulus -in /tmp/tls.crt | openssl md5
openssl rsa -noout -modulus -in /tmp/tls.key | openssl md5
- imagePullSecrets not working:
Verify the Secret type and the registry URL:
## The Secret type must be kubernetes.io/dockerconfigjson
kubectl get secret <secret-name> -o jsonpath='{.type}'
## Decode and verify the registry URL matches your image registry
kubectl get secret <secret-name> -o jsonpath='{.data.\.dockerconfigjson}' | base64 --decode | python3 -m json.tool
Next Steps¶
- Explore External Secrets Operator to sync Secrets from external secret managers (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, GCP Secret Manager).
- Try Sealed Secrets by Bitnami to safely store encrypted Secrets in Git repositories.
- Learn about HashiCorp Vault and the Vault Agent Injector for dynamic secret management in Kubernetes.
- Set up cert-manager to automate TLS certificate issuance and renewal with Let’s Encrypt.
- Implement Kubernetes RBAC policies to restrict Secret access to only the ServiceAccounts and users that need it.
- Enable Kubernetes Audit Logging to monitor who accesses Secrets in your cluster.
- Explore SOPS (Secrets OPerationS) for encrypting Secret manifests before committing to Git.
- Try the CSI Secrets Store Driver to mount Secrets from external stores as volumes without Kubernetes Secret objects.
NetworkPolicies - Pod-Level Firewall Rules¶
- In this lab we will learn how to use Kubernetes NetworkPolicies to control traffic flow between pods, namespaces, and external endpoints - effectively creating firewall rules at the pod level.
What will we learn?¶
- What NetworkPolicies are and how they work
- Default behavior: all pods can talk to all pods (open by default)
- How to create ingress and egress rules
- How to isolate namespaces from each other
- How to allow traffic only from specific pods using
podSelector - How to allow traffic only from specific namespaces using
namespaceSelector - How to restrict egress to specific CIDR blocks
- How to implement a default-deny policy
- Testing network policies with real traffic
Official Documentation & References¶
| Resource | Link |
|---|---|
| Network Policies | kubernetes.io/docs |
| Declare Network Policy | kubernetes.io/docs |
| NetworkPolicy API Reference | kubernetes.io/docs |
Prerequisites¶
- A running Kubernetes cluster with a CNI plugin that supports NetworkPolicies (e.g., Calico, Cilium, Weave Net)
kubectlconfigured against the cluster
Important: CNI Plugin Required
The default Kind/Minikube cluster with the default CNI (kindnet/bridge) does not enforce NetworkPolicies. You need a CNI that supports them:
# Create a Kind cluster without default CNI
kind create cluster --name netpol-lab --config manifests/kind-config.yaml
# Install Calico
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml
# Wait for Calico to be ready
kubectl wait --for=condition=Ready pods -l k8s-app=calico-node -n kube-system --timeout=120s
NetworkPolicy Overview¶
graph TB
subgraph cluster["Kubernetes Cluster"]
subgraph ns_a["Namespace: frontend"]
pod_fe["Pod: frontend\nlabel: app=frontend"]
end
subgraph ns_b["Namespace: backend"]
pod_be["Pod: backend\nlabel: app=backend"]
pod_db["Pod: database\nlabel: app=database"]
end
np["NetworkPolicy\non: backend namespace\nallow ingress from:\n app=frontend"]
end
pod_fe -- "β
Allowed" --> pod_be
pod_be -- "β
Allowed" --> pod_db
pod_fe -. "β Denied" .-> pod_db
np -.-> pod_be
np -.-> pod_db
| Concept | Description |
|---|---|
| Ingress rule | Controls incoming traffic to selected pods |
| Egress rule | Controls outgoing traffic from selected pods |
| podSelector | Selects which pods the policy applies to (by labels) |
| Default deny | When a policy selects a pod, all non-matching traffic is denied |
01. Setup: Create namespaces and test pods¶
# Clean up
kubectl delete namespace netpol-frontend netpol-backend --ignore-not-found
# Create namespaces with labels (needed for namespaceSelector)
kubectl create namespace netpol-frontend
kubectl label namespace netpol-frontend role=frontend
kubectl create namespace netpol-backend
kubectl label namespace netpol-backend role=backend
Deploy test workloads:
# manifests/test-pods.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
namespace: netpol-backend
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: web
namespace: netpol-backend
spec:
selector:
app: web
ports:
- port: 80
---
apiVersion: v1
kind: Pod
metadata:
name: client
namespace: netpol-frontend
labels:
app: client
spec:
containers:
- name: curl
image: curlimages/curl:latest
command: ["sleep", "3600"]
---
apiVersion: v1
kind: Pod
metadata:
name: rogue
namespace: netpol-frontend
labels:
app: rogue
spec:
containers:
- name: curl
image: curlimages/curl:latest
command: ["sleep", "3600"]
kubectl apply -f manifests/test-pods.yaml
# Wait for pods to be ready
kubectl wait --for=condition=Ready pod/client -n netpol-frontend --timeout=60s
kubectl wait --for=condition=Ready pod/rogue -n netpol-frontend --timeout=60s
kubectl wait --for=condition=Ready -l app=web pod -n netpol-backend --timeout=60s
02. Verify: Default behavior (all-open)¶
Without any NetworkPolicy, all pods can communicate:
# client can reach the web service
kubectl exec client -n netpol-frontend -- \
curl -s --max-time 3 web.netpol-backend.svc.cluster.local
# Expected: nginx welcome page HTML
# rogue can also reach it
kubectl exec rogue -n netpol-frontend -- \
curl -s --max-time 3 web.netpol-backend.svc.cluster.local
# Expected: nginx welcome page HTML
03. Default Deny All Ingress¶
Apply a default deny policy - blocks ALL incoming traffic to pods in netpol-backend:
# manifests/default-deny-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: netpol-backend
spec:
podSelector: {} # Selects ALL pods in this namespace
policyTypes:
- Ingress # No ingress rules = deny all incoming traffic
Test - both pods should now be blocked:
# Both should time out
kubectl exec client -n netpol-frontend -- \
curl -s --max-time 3 web.netpol-backend.svc.cluster.local
# Expected: timeout / connection refused
kubectl exec rogue -n netpol-frontend -- \
curl -s --max-time 3 web.netpol-backend.svc.cluster.local
# Expected: timeout / connection refused
04. Allow traffic from specific pods (podSelector)¶
Allow only pods with label app=client from the netpol-frontend namespace:
# manifests/allow-client-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-client-ingress
namespace: netpol-backend
spec:
podSelector:
matchLabels:
app: web
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
role: frontend
podSelector:
matchLabels:
app: client
ports:
- protocol: TCP
port: 80
AND vs OR in NetworkPolicy selectors
When namespaceSelector and podSelector are in the same from entry (no dash between them), they are combined with AND - both must match. If they were separate entries (each with its own dash), they would be OR.
Test:
# client should work (matches app=client in frontend namespace)
kubectl exec client -n netpol-frontend -- \
curl -s --max-time 3 web.netpol-backend.svc.cluster.local
# Expected: nginx welcome page HTML β
# rogue should still be blocked (has app=rogue, not app=client)
kubectl exec rogue -n netpol-frontend -- \
curl -s --max-time 3 web.netpol-backend.svc.cluster.local
# Expected: timeout β
05. Egress policy - Restrict outgoing traffic¶
Restrict pods in netpol-backend to only communicate with DNS and internal cluster IPs:
# manifests/restrict-egress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: restrict-egress
namespace: netpol-backend
spec:
podSelector:
matchLabels:
app: web
policyTypes:
- Egress
egress:
# Allow DNS resolution
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# Allow traffic to pods within the same namespace
- to:
- namespaceSelector:
matchLabels:
role: backend
06. Default Deny All (Ingress + Egress)¶
The most restrictive baseline - deny everything, then allowlist:
# manifests/default-deny-all.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: netpol-backend
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# Apply the policy (removes all the previous specific policies first)
kubectl delete networkpolicy --all -n netpol-backend
kubectl apply -f manifests/default-deny-all.yaml
Production Pattern
The recommended production approach:
- Start with a default-deny-all policy in each namespace
- Add specific allow policies for each required communication path
- This is a “whitelist” approach - explicit is better than implicit
07. Inspect and debug NetworkPolicies¶
# List all network policies in a namespace
kubectl get networkpolicy -n netpol-backend
# Describe a specific policy to see its rules
kubectl describe networkpolicy allow-client-ingress -n netpol-backend
# Check which pods are selected by a policy
kubectl get pods -n netpol-backend -l app=web
08. Cleanup¶
Summary¶
| Concept | Key Takeaway |
|---|---|
| Default behavior | Without NetworkPolicies, all pods can communicate freely |
| Default deny | podSelector: {} with no rules = deny all |
| Ingress rules | Control who can send traffic TO your pods |
| Egress rules | Control where your pods can send traffic |
| podSelector | Match traffic sources/destinations by pod labels |
| namespaceSelector | Match traffic sources/destinations by namespace labels |
| AND vs OR | Same from entry = AND; separate from entries = OR |
| Best practice | Default deny + explicit allow (whitelist approach) |
Exercises¶
The following exercises will test your understanding of Kubernetes NetworkPolicies. Try to solve each exercise on your own before revealing the solution.
01. Allow Ingress Only on a Specific Port¶
Create a NetworkPolicy that allows ingress to pods labeled app=api only on port 8080 (TCP), denying traffic on all other ports.
Scenario:¶
β¦ Your API server listens on port 8080 but also has a debug port 9090 that should never be accessible. β¦ You need to ensure only port 8080 is reachable from other pods.
Hint: Use spec.ingress.ports to specify the allowed port. All other ports will be denied once a policy selects the pod.
Solution
## Create the namespace
kubectl create namespace netpol-exercise --dry-run=client -o yaml | kubectl apply -f -
## Deploy an API pod that listens on two ports
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: api-server
namespace: netpol-exercise
labels:
app: api
spec:
containers:
- name: api
image: nginx:alpine
ports:
- containerPort: 80
EOF
## Deploy a client pod
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: test-client
namespace: netpol-exercise
labels:
app: client
spec:
containers:
- name: curl
image: curlimages/curl:latest
command: ["sleep", "3600"]
EOF
## Wait for pods
kubectl wait --for=condition=Ready pod/api-server -n netpol-exercise --timeout=60s
kubectl wait --for=condition=Ready pod/test-client -n netpol-exercise --timeout=60s
## Apply the NetworkPolicy allowing only port 8080
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-port-only
namespace: netpol-exercise
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- ports:
- protocol: TCP
port: 8080
EOF
## Test: port 8080 would be allowed (if it were listening)
## Port 80 should be denied by the policy
kubectl exec test-client -n netpol-exercise -- \
curl -s --max-time 3 api-server:80 2>&1 || echo "Blocked as expected"
## Clean up
kubectl delete namespace netpol-exercise
02. Allow Traffic Between Specific Namespaces Only¶
Create two namespaces (team-a and team-b) and a NetworkPolicy that allows pods in team-b to receive ingress only from pods in team-a.
Scenario:¶
β¦ Team A runs a frontend that needs to call Team B’s backend service. β¦ No other namespaces should be able to reach Team B’s pods.
Hint: Label the namespaces and use namespaceSelector in the ingress rule.
Solution
## Create namespaces with labels
kubectl create namespace team-a
kubectl label namespace team-a team=a
kubectl create namespace team-b
kubectl label namespace team-b team=b
## Deploy a backend in team-b
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: backend
namespace: team-b
labels:
app: backend
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: backend
namespace: team-b
spec:
selector:
app: backend
ports:
- port: 80
EOF
## Deploy clients in team-a and default namespace
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: frontend
namespace: team-a
labels:
app: frontend
spec:
containers:
- name: curl
image: curlimages/curl:latest
command: ["sleep", "3600"]
EOF
## Wait for pods
kubectl wait --for=condition=Ready pod/backend -n team-b --timeout=60s
kubectl wait --for=condition=Ready pod/frontend -n team-a --timeout=60s
## Apply default deny + allow from team-a only
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
namespace: team-b
spec:
podSelector: {}
policyTypes:
- Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-team-a
namespace: team-b
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
team: a
ports:
- protocol: TCP
port: 80
EOF
## Test: frontend in team-a can reach backend
kubectl exec frontend -n team-a -- \
curl -s --max-time 3 backend.team-b.svc.cluster.local
## Expected: nginx welcome page β
## Clean up
kubectl delete namespace team-a team-b
03. Create an Egress Policy That Only Allows DNS and HTTPS¶
Create a NetworkPolicy for pods labeled app=secure-app that only allows egress to DNS (UDP/TCP port 53) and HTTPS (TCP port 443).
Scenario:¶
β¦ Your application needs to resolve DNS names and make HTTPS API calls. β¦ All other outbound traffic (HTTP, SSH, database ports) must be blocked.
Hint: Use multiple entries in spec.egress with specific port rules.
Solution
## Create namespace
kubectl create namespace egress-test
## Deploy the pod
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: secure-app
namespace: egress-test
labels:
app: secure-app
spec:
containers:
- name: curl
image: curlimages/curl:latest
command: ["sleep", "3600"]
EOF
kubectl wait --for=condition=Ready pod/secure-app -n egress-test --timeout=60s
## Apply the egress policy
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: restrict-egress-dns-https
namespace: egress-test
spec:
podSelector:
matchLabels:
app: secure-app
policyTypes:
- Egress
egress:
# Allow DNS resolution
- ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# Allow HTTPS traffic
- ports:
- protocol: TCP
port: 443
EOF
## Test: HTTPS should work (port 443)
kubectl exec secure-app -n egress-test -- \
curl -s --max-time 5 -o /dev/null -w "%{http_code}" https://kubernetes.default.svc:443 -k 2>&1 || echo "May vary by cluster"
## Test: HTTP on port 80 should be blocked
kubectl exec secure-app -n egress-test -- \
curl -s --max-time 3 http://kubernetes.default.svc:80 2>&1 || echo "Blocked as expected"
## Clean up
kubectl delete namespace egress-test
04. Implement a Zero-Trust Network Model¶
Implement a complete zero-trust model for a three-tier application: frontend (port 80), backend (port 8080), and database (port 5432). Each tier can only communicate with its adjacent tier.
Scenario:¶
β¦ Frontend pods can receive traffic from anywhere but can only talk to backend pods. β¦ Backend pods can only receive traffic from frontend pods and can only talk to database pods. β¦ Database pods can only receive traffic from backend pods and cannot make any outbound connections.
Hint: Start with default-deny-all in the namespace, then add specific allow policies for each tier.
Solution
## Create namespace
kubectl create namespace zero-trust
## Deploy all three tiers
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: frontend
namespace: zero-trust
labels:
tier: frontend
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: backend
namespace: zero-trust
labels:
tier: backend
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
name: database
namespace: zero-trust
labels:
tier: database
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
EOF
## Wait for pods
kubectl wait --for=condition=Ready pod -l tier -n zero-trust --timeout=60s
## Step 1: Default deny all traffic
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: zero-trust
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
## Step 2: Frontend can receive from anywhere, egress to backend + DNS
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-policy
namespace: zero-trust
spec:
podSelector:
matchLabels:
tier: frontend
policyTypes:
- Ingress
- Egress
ingress:
- {}
egress:
- to:
- podSelector:
matchLabels:
tier: backend
- ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
EOF
## Step 3: Backend receives from frontend, egress to database + DNS
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: zero-trust
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
egress:
- to:
- podSelector:
matchLabels:
tier: database
- ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
EOF
## Step 4: Database receives from backend only, no egress
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-policy
namespace: zero-trust
spec:
podSelector:
matchLabels:
tier: database
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: backend
egress: []
EOF
## Verify policies
kubectl get networkpolicy -n zero-trust
## Clean up
kubectl delete namespace zero-trust
Troubleshooting¶
- NetworkPolicy has no effect:
Verify your CNI plugin supports NetworkPolicies:
## Check which CNI is installed
kubectl get pods -n kube-system | grep -E "calico|cilium|weave"
## If using Kind with default CNI (kindnet), NetworkPolicies are NOT enforced
## Reinstall with Calico:
## kind delete cluster && kind create cluster --config manifests/kind-config.yaml
## kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml
- Pods cannot resolve DNS after applying deny policy:
Default-deny policies block DNS (port 53). Always include a DNS allow rule in egress policies:
## Add DNS egress to your policy:
## egress:
## - ports:
## - protocol: UDP
## port: 53
## - protocol: TCP
## port: 53
- Traffic still getting through after applying deny policy:
Check that the policy selects the correct pods and is in the correct namespace:
## Verify the policy is applied
kubectl get networkpolicy -n <namespace>
## Describe the policy to see which pods it selects
kubectl describe networkpolicy <policy-name> -n <namespace>
## Verify pod labels match
kubectl get pods -n <namespace> --show-labels
- Cannot determine if AND or OR logic is being used:
Remember: items in the same from entry (same -) are ANDed; separate from entries (separate -) are ORed:
## AND (both must match):
ingress:
- from:
- namespaceSelector:
matchLabels:
role: frontend
podSelector: # No dash - same entry = AND
matchLabels:
app: client
## OR (either can match):
ingress:
- from:
- namespaceSelector:
matchLabels:
role: frontend
- podSelector: # Dash - separate entry = OR
matchLabels:
app: client
- Testing connectivity between pods:
Use curl or wget with timeouts to test connectivity:
## Using curl with timeout
kubectl exec <pod> -n <namespace> -- curl -s --max-time 3 <service>.<namespace>.svc.cluster.local
## Using wget (if curl not available)
kubectl exec <pod> -n <namespace> -- wget -qO- --timeout=3 <service>.<namespace>.svc.cluster.local
## Using nc (netcat) for specific ports
kubectl exec <pod> -n <namespace> -- nc -zv -w 3 <service> <port>
Next Steps¶
- Explore Cilium NetworkPolicies for advanced L7 (HTTP, gRPC) filtering and DNS-aware policies.
- Learn about Calico NetworkPolicies for enterprise-grade network security with global policies and FQDN-based rules.
- Try Network Policy Editor - a visual tool for building and visualizing NetworkPolicies.
- Implement Kubernetes Security Best Practices combining NetworkPolicies with RBAC, Pod Security Standards, and Secrets management.
- Explore service mesh solutions (Istio, Linkerd) for mTLS and L7 traffic management on top of NetworkPolicies.
ResourceQuotas & LimitRanges - Multi-Tenant Resource Control¶
- In this lab we will learn how to use ResourceQuotas and LimitRanges to control and limit resource consumption per namespace, ensuring fair sharing in multi-tenant Kubernetes clusters.
What will we learn?¶
- What ResourceQuotas and LimitRanges are
- How to set CPU, memory, and object count limits per namespace
- How to enforce default resource requests/limits for every container
- How to prevent namespace resource starvation
- How ResourceQuotas and LimitRanges work together
- How to monitor quota usage
- Best practices for multi-tenant clusters
Official Documentation & References¶
| Resource | Link |
|---|---|
| Resource Quotas | kubernetes.io/docs |
| Limit Ranges | kubernetes.io/docs |
| Managing Resources for Containers | kubernetes.io/docs |
| Configure Quotas for API Objects | kubernetes.io/docs |
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
Overview¶
graph TB
subgraph cluster["Kubernetes Cluster"]
subgraph ns_a["Namespace: team-alpha"]
rq_a["ResourceQuota\nCPU: 2 cores\nMemory: 4Gi\nPods: 10"]
lr_a["LimitRange\nDefault CPU: 200m\nDefault Mem: 256Mi"]
pods_a["Pods (within limits)"]
end
subgraph ns_b["Namespace: team-beta"]
rq_b["ResourceQuota\nCPU: 4 cores\nMemory: 8Gi\nPods: 20"]
lr_b["LimitRange\nDefault CPU: 500m\nDefault Mem: 512Mi"]
pods_b["Pods (within limits)"]
end
end
rq_a --> pods_a
lr_a --> pods_a
rq_b --> pods_b
lr_b --> pods_b
| Resource | Scope | Purpose |
|---|---|---|
| ResourceQuota | Namespace | Caps the total resources a namespace can consume |
| LimitRange | Namespace | Sets per-container default and max resource constraints |
01. Create namespaces¶
# Clean up
kubectl delete namespace quota-lab --ignore-not-found
# Create lab namespace
kubectl create namespace quota-lab
02. Create a ResourceQuota¶
A ResourceQuota limits the total resources consumed by all pods in a namespace:
# manifests/resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: quota-lab
spec:
hard:
# Compute resources
requests.cpu: "2" # Total CPU requests across all pods
requests.memory: 4Gi # Total memory requests across all pods
limits.cpu: "4" # Total CPU limits across all pods
limits.memory: 8Gi # Total memory limits across all pods
# Object count limits
pods: "10" # Maximum number of pods
services: "5" # Maximum number of services
configmaps: "10" # Maximum number of configmaps
secrets: "10" # Maximum number of secrets
persistentvolumeclaims: "5"
kubectl apply -f manifests/resource-quota.yaml
# Check the quota
kubectl get resourcequota -n quota-lab
kubectl describe resourcequota compute-quota -n quota-lab
Expected output:
Name: compute-quota
Namespace: quota-lab
Resource Used Hard
-------- ---- ----
configmaps 1 10
limits.cpu 0 4
limits.memory 0 8Gi
persistentvolumeclaims 0 5
pods 0 10
requests.cpu 0 2
requests.memory 0 4Gi
secrets 0 10
services 0 5
03. Deploy a pod without resource requests (will fail!)¶
Important
When a ResourceQuota is active in a namespace, every container must specify resource requests and limits. Pods without them will be rejected.
# This will FAIL because no resource requests/limits are specified
kubectl run test-no-limits --image=nginx -n quota-lab
# Error: pods "test-no-limits" is forbidden: failed quota: compute-quota:
# must specify limits.cpu, limits.memory, requests.cpu, requests.memory
04. Deploy a pod with resource requests¶
# manifests/pod-with-resources.yaml
apiVersion: v1
kind: Pod
metadata:
name: web-server
namespace: quota-lab
spec:
containers:
- name: nginx
image: nginx:alpine
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
kubectl apply -f manifests/pod-with-resources.yaml
# Check quota usage - resources are now counted
kubectl describe resourcequota compute-quota -n quota-lab
Expected:
Resource Used Hard
-------- ---- ----
limits.cpu 500m 4
limits.memory 256Mi 8Gi
pods 1 10
requests.cpu 250m 2
requests.memory 128Mi 4Gi
05. Exceed the quota (will fail!)¶
Try to deploy more pods than the quota allows:
# manifests/deployment-exceeds-quota.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: hungry-app
namespace: quota-lab
spec:
replicas: 5
selector:
matchLabels:
app: hungry
template:
metadata:
labels:
app: hungry
spec:
containers:
- name: app
image: nginx:alpine
resources:
requests:
cpu: 500m # 5 replicas Γ 500m = 2.5 CPU > 2 CPU quota
memory: 1Gi # 5 replicas Γ 1Gi = 5Gi > 4Gi quota
limits:
cpu: "1"
memory: 2Gi
kubectl apply -f manifests/deployment-exceeds-quota.yaml
# Check how many pods were actually created
kubectl get pods -n quota-lab -l app=hungry
# Only some pods will be created - the rest are blocked by quota
# Check events for the ReplicaSet to see the quota error
kubectl get events -n quota-lab --field-selector reason=FailedCreate --sort-by='.lastTimestamp'
# "exceeded quota: compute-quota"
06. Create a LimitRange¶
A LimitRange sets per-container defaults and constraints - so pods without explicit resources still get reasonable limits:
# manifests/limit-range.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: quota-lab
spec:
limits:
- type: Container
default: # Default limits (if not specified)
cpu: 300m
memory: 256Mi
defaultRequest: # Default requests (if not specified)
cpu: 100m
memory: 128Mi
max: # Maximum allowed per container
cpu: "1"
memory: 1Gi
min: # Minimum required per container
cpu: 50m
memory: 64Mi
- type: Pod
max: # Maximum total resources per pod
cpu: "2"
memory: 2Gi
# Clean up previous resources
kubectl delete deployment hungry-app -n quota-lab --ignore-not-found
kubectl delete pod web-server -n quota-lab --ignore-not-found
kubectl apply -f manifests/limit-range.yaml
# Inspect the LimitRange
kubectl describe limitrange default-limits -n quota-lab
07. Deploy a pod without resource specs (gets defaults from LimitRange)¶
# Now this works! LimitRange injects default requests/limits
kubectl run auto-limited --image=nginx:alpine -n quota-lab
# Wait for it
kubectl wait --for=condition=Ready pod/auto-limited -n quota-lab --timeout=60s
# Check the pod - it has resources injected automatically
kubectl get pod auto-limited -n quota-lab -o jsonpath='{.spec.containers[0].resources}' | python3 -m json.tool
Expected output:
{
"limits": {
"cpu": "300m",
"memory": "256Mi"
},
"requests": {
"cpu": "100m",
"memory": "128Mi"
}
}
08. Try to exceed per-container max (will fail!)¶
# manifests/pod-exceeds-limit.yaml
apiVersion: v1
kind: Pod
metadata:
name: greedy-pod
namespace: quota-lab
spec:
containers:
- name: app
image: nginx:alpine
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: "2" # Exceeds LimitRange max of 1 CPU
memory: 2Gi # Exceeds LimitRange max of 1Gi
kubectl apply -f manifests/pod-exceeds-limit.yaml
# Error: pods "greedy-pod" is forbidden:
# [maximum cpu usage per Container is 1, but limit is 2,
# maximum memory usage per Container is 1Gi, but limit is 2Gi]
09. Monitor quota usage¶
# View quota usage across all namespaces
kubectl get resourcequota --all-namespaces
# Detailed usage for our namespace
kubectl describe resourcequota compute-quota -n quota-lab
# View in JSON/YAML for programmatic access
kubectl get resourcequota compute-quota -n quota-lab -o yaml
# Check if a specific resource is approaching limits
kubectl get resourcequota compute-quota -n quota-lab \
-o jsonpath='{range .status.used}{@}{"\n"}{end}'
10. Scope-based quotas (optional advanced topic)¶
ResourceQuotas can be scoped to specific priority classes or pod states:
# manifests/scoped-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: best-effort-quota
namespace: quota-lab
spec:
hard:
pods: "5"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["low-priority"]
11. Cleanup¶
Summary¶
| Concept | Key Takeaway |
|---|---|
| ResourceQuota | Caps total namespace resource consumption (CPU, memory, objects) |
| LimitRange | Sets per-container defaults, min, and max |
| Together | LimitRange provides defaults β ResourceQuota enforces totals |
| No resources = rejected | With a ResourceQuota, pods must declare resources |
| Defaults injection | LimitRange automatically adds requests/limits to naked pods |
| Object count quotas | Limit pods, services, secrets, PVCs, etc. per namespace |
| Best practice | Always use both ResourceQuota + LimitRange in multi-tenant clusters |
Exercises¶
The following exercises will test your understanding of ResourceQuotas and LimitRanges. Try to solve each exercise on your own before revealing the solution.
01. Create a Quota That Limits Only Object Counts¶
Create a ResourceQuota named object-count-quota that limits the namespace to a maximum of 3 pods, 2 services, and 2 configmaps - without any CPU or memory restrictions.
Scenario:¶
β¦ You want to prevent teams from accidentally creating too many resources. β¦ CPU and memory are managed by the LimitRange, so the quota only needs to count objects.
Hint: Use the pods, services, and configmaps fields under spec.hard without any requests.* or limits.* fields.
Solution
## Create namespace
kubectl create namespace count-quota-test
## Apply the object-count-only quota
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
name: object-count-quota
namespace: count-quota-test
spec:
hard:
pods: "3"
services: "2"
configmaps: "2"
EOF
## Verify
kubectl describe resourcequota object-count-quota -n count-quota-test
## Create pods (no resource requests needed since no CPU/memory quota)
kubectl run pod1 --image=nginx:alpine -n count-quota-test
kubectl run pod2 --image=nginx:alpine -n count-quota-test
kubectl run pod3 --image=nginx:alpine -n count-quota-test
## This 4th pod should be rejected
kubectl run pod4 --image=nginx:alpine -n count-quota-test 2>&1 || echo "Rejected: quota exceeded"
## Check usage
kubectl describe resourcequota object-count-quota -n count-quota-test
## Clean up
kubectl delete namespace count-quota-test
02. Use LimitRange to Set Max Resource Per Pod¶
Create a LimitRange that sets the maximum total CPU per pod to 1 core and maximum total memory per pod to 1Gi. Deploy a pod with two containers that together exceed these limits.
Scenario:¶
β¦ You want to prevent any single pod from consuming too many resources. β¦ The limit applies to the sum of all containers in the pod, not per container.
Hint: Use type: Pod in the LimitRange with max settings.
Solution
## Create namespace
kubectl create namespace pod-limit-test
## Create the pod-level LimitRange
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: LimitRange
metadata:
name: pod-max-limits
namespace: pod-limit-test
spec:
limits:
- type: Pod
max:
cpu: "1"
memory: 1Gi
- type: Container
defaultRequest:
cpu: 100m
memory: 128Mi
default:
cpu: 200m
memory: 256Mi
EOF
## This pod should work (total: 400m CPU, 512Mi memory)
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: within-limits
namespace: pod-limit-test
spec:
containers:
- name: app1
image: nginx:alpine
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
- name: app2
image: nginx:alpine
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
EOF
echo "Pod within-limits created successfully"
## This pod should be REJECTED (total: 1200m CPU > 1 core max)
cat <<'EOF' | kubectl apply -f - 2>&1 || echo "Rejected: exceeds pod max"
apiVersion: v1
kind: Pod
metadata:
name: exceeds-limits
namespace: pod-limit-test
spec:
containers:
- name: app1
image: nginx:alpine
resources:
requests:
cpu: 400m
memory: 256Mi
limits:
cpu: 600m
memory: 512Mi
- name: app2
image: nginx:alpine
resources:
requests:
cpu: 400m
memory: 256Mi
limits:
cpu: 600m
memory: 512Mi
EOF
## Clean up
kubectl delete namespace pod-limit-test
03. Monitor Quota Usage and Set Up Alerts¶
Write a script that checks quota usage across all namespaces and warns when any resource exceeds 80% utilization.
Scenario:¶
β¦ You are responsible for cluster operations and need early warning when namespaces approach their quotas. β¦ You want a simple script (no Prometheus needed) to check current usage.
Hint: Use kubectl get resourcequota -A -o json and parse the status.used vs status.hard fields.
Solution
## Create a test namespace with quota for demonstration
kubectl create namespace quota-monitor-test
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
name: test-quota
namespace: quota-monitor-test
spec:
hard:
pods: "5"
requests.cpu: "1"
requests.memory: 1Gi
EOF
## Create some pods to use quota
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: consumer1
namespace: quota-monitor-test
spec:
containers:
- name: app
image: nginx:alpine
resources:
requests:
cpu: 400m
memory: 512Mi
limits:
cpu: 400m
memory: 512Mi
EOF
## The monitoring script
cat <<'SCRIPT'
#!/bin/bash
## quota-monitor.sh - Check ResourceQuota usage across all namespaces
THRESHOLD=80
echo "=== ResourceQuota Usage Report ==="
echo ""
kubectl get resourcequota -A -o json | python3 -c "
import json, sys
data = json.load(sys.stdin)
threshold = $THRESHOLD
for item in data.get('items', []):
ns = item['metadata']['namespace']
name = item['metadata']['name']
hard = item.get('status', {}).get('hard', {})
used = item.get('status', {}).get('used', {})
for resource in hard:
h = hard[resource]
u = used.get(resource, '0')
# Parse values (simplified - handles integers and 'Mi/Gi' suffixes)
def parse_val(v):
v = str(v)
if v.endswith('Gi'): return float(v[:-2]) * 1024
if v.endswith('Mi'): return float(v[:-2])
if v.endswith('m'): return float(v[:-1])
try: return float(v)
except: return 0
hard_val = parse_val(h)
used_val = parse_val(u)
if hard_val > 0:
pct = (used_val / hard_val) * 100
status = 'β οΈ WARNING' if pct >= threshold else 'β
'
print(f'{status} {ns}/{name}: {resource} = {u}/{h} ({pct:.0f}%)')
"
SCRIPT
## Clean up
kubectl delete namespace quota-monitor-test
04. Create Quotas for Different Priority Classes¶
Create two PriorityClasses (high-priority and low-priority) and ResourceQuotas that limit how many pods of each priority class can run.
Scenario:¶
β¦ Critical services use high-priority and should get up to 5 pods.
β¦ Background jobs use low-priority and should be limited to 3 pods.
β¦ This prevents low-priority workloads from consuming the namespace’s pod quota.
Hint: Use scopeSelector with PriorityClass scope in the ResourceQuota.
Solution
## Create namespace
kubectl create namespace priority-quota-test
## Create PriorityClasses
cat <<'EOF' | kubectl apply -f -
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000
globalDefault: false
description: "High priority for critical services"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 100
globalDefault: false
description: "Low priority for background jobs"
EOF
## Create scoped quotas
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
name: high-priority-quota
namespace: priority-quota-test
spec:
hard:
pods: "5"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high-priority"]
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: low-priority-quota
namespace: priority-quota-test
spec:
hard:
pods: "3"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["low-priority"]
EOF
## Verify quotas
kubectl describe resourcequota -n priority-quota-test
## Deploy high-priority pods
for i in 1 2 3; do
kubectl run high-$i --image=nginx:alpine --priority-class-name=high-priority -n priority-quota-test
done
## Deploy low-priority pods
for i in 1 2 3; do
kubectl run low-$i --image=nginx:alpine --priority-class-name=low-priority -n priority-quota-test
done
## This 4th low-priority pod should be rejected
kubectl run low-4 --image=nginx:alpine --priority-class-name=low-priority -n priority-quota-test 2>&1 || echo "Rejected: low-priority quota exceeded"
## Check quota usage
kubectl describe resourcequota -n priority-quota-test
## Clean up
kubectl delete namespace priority-quota-test
kubectl delete priorityclass high-priority low-priority
05. Verify LimitRange Defaults Are Applied Automatically¶
Create a LimitRange, deploy a pod without any resource specifications, and verify the pod received the default values from the LimitRange.
Scenario:¶
β¦ New developers on your team forget to set resource requests/limits. β¦ The LimitRange ensures every container gets reasonable defaults. β¦ You need to prove the injection happens automatically.
Hint: Deploy a pod with kubectl run (no resources) and inspect the pod’s YAML to see injected values.
Solution
## Create namespace with LimitRange
kubectl create namespace defaults-test
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: LimitRange
metadata:
name: container-defaults
namespace: defaults-test
spec:
limits:
- type: Container
default:
cpu: 500m
memory: 256Mi
defaultRequest:
cpu: 200m
memory: 128Mi
EOF
## Deploy a pod WITHOUT any resource specifications
kubectl run naked-pod --image=nginx:alpine -n defaults-test
## Wait for it
kubectl wait --for=condition=Ready pod/naked-pod -n defaults-test --timeout=60s
## Inspect the pod - LimitRange should have injected defaults
kubectl get pod naked-pod -n defaults-test -o jsonpath='{.spec.containers[0].resources}' | python3 -m json.tool
## Expected output:
## {
## "limits": {
## "cpu": "500m",
## "memory": "256Mi"
## },
## "requests": {
## "cpu": "200m",
## "memory": "128Mi"
## }
## }
## Verify LimitRange is the source
kubectl describe limitrange container-defaults -n defaults-test
## Clean up
kubectl delete namespace defaults-test
Troubleshooting¶
- Pod rejected with “must specify limits/requests”:
A ResourceQuota is active and requires all pods to declare resources. Add resource requests/limits or create a LimitRange to inject defaults:
## Check if a ResourceQuota exists
kubectl get resourcequota -n <namespace>
## Quick fix: Add a LimitRange to inject defaults
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: <namespace>
spec:
limits:
- type: Container
default:
cpu: 200m
memory: 128Mi
defaultRequest:
cpu: 100m
memory: 64Mi
EOF
- Deployment stuck - ReplicaSet cannot create pods:
Check the ReplicaSet events for quota errors:
## Get the ReplicaSet name
kubectl get rs -n <namespace>
## Check events for quota errors
kubectl describe rs <replicaset-name> -n <namespace> | grep -A5 "Events:"
## Or check events directly
kubectl get events -n <namespace> --field-selector reason=FailedCreate --sort-by='.lastTimestamp'
- Pod rejected with “exceeds maximum” from LimitRange:
The container’s requests or limits exceed the LimitRange maximum. Reduce the resource values:
## Check the LimitRange constraints
kubectl describe limitrange -n <namespace>
## Show min/max for Container and Pod types
kubectl get limitrange -n <namespace> -o yaml
- Quota shows “Used” but pods are not running:
Quota counts even non-running pods. Check for failed or pending pods:
## List all pods (including non-running)
kubectl get pods -n <namespace> --field-selector=status.phase!=Running
## Delete failed pods to reclaim quota
kubectl delete pods --field-selector=status.phase=Failed -n <namespace>
- Cannot determine remaining quota capacity:
Compare Used vs Hard values to see remaining capacity:
## Detailed quota view
kubectl describe resourcequota -n <namespace>
## JSON output for programmatic access
kubectl get resourcequota -n <namespace> -o json | python3 -c "
import json, sys
data = json.load(sys.stdin)
for item in data.get('items', data.get('status', {}).get('hard', {}) and [data]):
status = item.get('status', {})
hard = status.get('hard', {})
used = status.get('used', {})
print(f\"Quota: {item['metadata']['name']}\")
for k in hard:
print(f' {k}: {used.get(k, \"0\")} / {hard[k]}')
"
Next Steps¶
- Explore Vertical Pod Autoscaler (VPA) to automatically adjust resource requests based on actual usage.
- Learn about Horizontal Pod Autoscaler (HPA) to scale replicas based on resource utilization.
- Study KEDA (Lab 30) for event-driven autoscaling that works alongside quotas.
- Explore Hierarchical Quotas for managing quotas in complex multi-tenant setups with namespace hierarchies.
- Try kube-resource-report for visualizing resource usage across namespaces.
- Implement PodDisruptionBudgets (Lab 17) alongside quotas to ensure availability during maintenance.
Storage & Config
Data Store: Secrets, ConfigMaps & Secret Management¶
Overview¶
In this lab we will learn how to manage application configuration in Kubernetes using Secrets and ConfigMaps, and then go beyond the basics with Sealed Secrets and the External Secrets Operator (ESO) for production-grade secret management.
| Resource | Purpose |
|---|---|
| Secret | Stores sensitive data (passwords, tokens, certificates, API keys) encoded in Base64 |
| ConfigMap | Stores non-sensitive configuration data (feature flags, connection strings, config files) |
| SealedSecret | Encrypted Secret safe to store in Git (Bitnami Sealed Secrets) |
| ExternalSecret | Syncs secrets from external providers (Vault, AWS, GCP, Azure) into Kubernetes |
Official Documentation & References¶
| Resource | Link |
|---|---|
| Kubernetes Secrets | kubernetes.io/docs |
| Kubernetes ConfigMaps | kubernetes.io/docs |
| Sealed Secrets GitHub | github.com/bitnami-labs/sealed-secrets |
| External Secrets Operator | external-secrets.io |
| Kubernetes Secrets Best Practices | kubernetes.io/docs |
| Encrypting Secrets at Rest | kubernetes.io/docs |
What will we learn?¶
Part 1 - Secrets & ConfigMaps Basics¶
- How to create Secrets and ConfigMaps (imperative & declarative)
- How to inject configuration into pods via environment variables
- How to mount configuration as files/volumes
- How to update and rotate Secrets/ConfigMaps
- Key differences between Secrets and ConfigMaps
- Best practices for managing configuration in Kubernetes
Part 2 - Advanced Secrets Management¶
- Why base64-encoded Secrets are not encryption (the problem)
- How Sealed Secrets (Bitnami) encrypts secrets for safe Git storage
- How the External Secrets Operator syncs secrets from external providers
- How to install and use
kubesealCLI - How to create SealedSecrets that can be committed to Git
- How to configure ExternalSecret resources with a SecretStore
- Best practices for secret management in production
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster- Helm installed (
helm version) - needed for Part 2 - Docker installed (optional - only needed if you want to build the demo image yourself)
01. Create namespace¶
# If the namespace already exists and contains data from previous steps, let's clean it
kubectl delete namespace codewizard --ignore-not-found
# Create the desired namespace [codewizard]
kubectl create namespace codewizard
Note
- You can skip section 02 if you don’t wish to build and push your own Docker container.
- A pre-built image
nirgeier/k8s-secrets-sampleis available on Docker Hub.
02. Build the demo Docker container (Optional)¶
1. Write the server code¶
- For this demo we use a tiny Node.js HTTP server that reads configuration from environment variables and returns them in the response.
- Source file: resources/server.js
//
// server.js
//
const
// Get those values in runtime.
// The variables will be passed from the Dockerfile and later on from K8S ConfigMap/Secret
language = process.env.LANGUAGE,
token = process.env.TOKEN;
require("http")
.createServer((request, response) => {
response.write(`Language: ${language}\n`);
response.write(`Token : ${token}\n`);
response.end(`\n`);
})
// Set the default port to 5000
.listen(process.env.PORT || 5000);
2. Write the Dockerfile¶
- If you wish, you can skip this and use the existing image:
nirgeier/k8s-secrets-sample - Source file: resources/Dockerfile
# Base Image
FROM node
# Exposed port - same port is defined in server.js
EXPOSE 5000
# The "configuration" which we pass in runtime
# The server will "read" those variables at run time and will print them out
ENV LANGUAGE Hebrew
ENV TOKEN Hard-To-Guess
# Copy the server to the container
COPY server.js .
# Start the server
ENTRYPOINT node server.js
3. Build the Docker container¶
# Replace `nirgeier` with your own Docker Hub username
docker build -t nirgeier/k8s-secrets-sample ./resources/
4. Test the container locally¶
# Run the container
docker run -d -p 5000:5000 --name server nirgeier/k8s-secrets-sample
# Get the response - values should come from the Dockerfile ENVs
curl 127.0.0.1:5000
# Expected response:
# Language: Hebrew
# Token : Hard-To-Guess
- Stop and remove the container when done:
- (Optional) Push the container to your Docker Hub account:
03. Deploy with hardcoded environment variables¶
In this step we will deploy the container with environment variables defined directly in the YAML - no Secrets or ConfigMaps yet.
1. Review the deployment & service file¶
- Source file: resources/variables-from-yaml.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: codewizard-secrets
namespace: codewizard
spec:
replicas: 1
selector:
matchLabels:
name: codewizard-secrets
template:
metadata:
labels:
name: codewizard-secrets
spec:
containers:
- name: secrets
image: nirgeier/k8s-secrets-sample
imagePullPolicy: Always
ports:
- containerPort: 5000
env:
- name: LANGUAGE
value: Hebrew
- name: TOKEN
value: Hard-To-Guess2
resources:
limits:
cpu: "500m"
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: codewizard-secrets
namespace: codewizard
spec:
selector:
name: codewizard-secrets
ports:
- protocol: TCP
port: 5000
targetPort: 5000
2. Deploy to cluster¶
3. Test the app¶
# Get the pod name
kubectl get pods -n codewizard
# Test the response directly from the pod (no need for a separate container)
kubectl exec -it -n codewizard \
$(kubectl get pod -n codewizard -l name=codewizard-secrets -o jsonpath='{.items[0].metadata.name}') \
-- sh -c "curl -s localhost:5000"
# Expected response:
# Language: Hebrew
# Token : Hard-To-Guess2
Why not use the Service?
The Service makes the app accessible to other pods in the cluster. For quick testing, we can exec into the pod directly.
In a real environment you would use the service DNS name: codewizard-secrets.codewizard.svc.cluster.local:5000
04. Using Secrets & ConfigMaps (Imperative)¶
Now let’s externalize the configuration into proper Kubernetes resources.
1. Create a Secret and a ConfigMap¶
# Create the secret (imperative)
# Key = TOKEN
# Value = Hard-To-Guess3
kubectl create secret generic token \
-n codewizard \
--from-literal=TOKEN=Hard-To-Guess3
# Create the config map (imperative)
# Key = LANGUAGE
# Value = English
kubectl create configmap language \
-n codewizard \
--from-literal=LANGUAGE=English
2. Verify the resources were created¶
# List secrets and config maps
kubectl get secrets,cm -n codewizard
# View the secret details (note: data is Base64-encoded)
kubectl describe secret token -n codewizard
# View the config map details (note: data is plain text)
kubectl describe cm language -n codewizard
3. Decode a Secret value¶
Secrets are stored as Base64-encoded strings. To view the actual value:
# Get the raw Base64 value
kubectl get secret token -n codewizard -o jsonpath='{.data.TOKEN}'
# Decode it
kubectl get secret token -n codewizard -o jsonpath='{.data.TOKEN}' | base64 -d
# Output: Hard-To-Guess3
Important
Base64 is encoding, not encryption. Anyone with access to the Secret resource can decode it. For real security, consider using:
- Sealed Secrets
- External Secrets Operator
- HashiCorp Vault
- Enabling encryption at rest for etcd
05. Inject Secrets & ConfigMaps as environment variables¶
1. Update the deployment to reference Secret & ConfigMap¶
- Source file: resources/variables-from-secrets.yaml
- The key change is in the
envsection - instead of hardcoded values, we reference the ConfigMap and Secret:
env:
- name: LANGUAGE
valueFrom:
configMapKeyRef: # Read from the ConfigMap
name: language # The ConfigMap name
key: LANGUAGE # The key inside the ConfigMap
- name: TOKEN
valueFrom:
secretKeyRef: # Read from the Secret
name: token # The Secret name
key: TOKEN # The key inside the Secret
2. Apply the updated deployment¶
3. Test the changes¶
# Wait for the new pod to be ready
kubectl rollout status deployment/codewizard-secrets -n codewizard
# Test the response
kubectl exec -it -n codewizard \
$(kubectl get pod -n codewizard -l name=codewizard-secrets -o jsonpath='{.items[0].metadata.name}') \
-- sh -c "curl -s localhost:5000"
# Expected response:
# Language: English
# Token : Hard-To-Guess3
The values now come from the ConfigMap and Secret instead of being hardcoded!
06. Create Secrets & ConfigMaps declaratively (YAML)¶
Instead of imperative kubectl create commands, you can define Secrets and ConfigMaps in YAML files.
1. Secret YAML¶
- Source file: resources/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: token
data:
# Base64-encoded value of "Hard-To-Guess3"
# echo -n "Hard-To-Guess3" | base64
TOKEN: SGFyZC1Uby1HdWVzczM=
type: Opaque
2. Using stringData (plain text - recommended for readability)¶
You can also use stringData to avoid manual Base64 encoding. Kubernetes will encode it for you:
3. ConfigMap YAML¶
4. Apply declarative resources¶
# Apply the secret (delete existing one first to avoid conflicts)
kubectl delete secret token -n codewizard --ignore-not-found
kubectl apply -n codewizard -f resources/secret.yaml
# Verify
kubectl get secret token -n codewizard -o jsonpath='{.data.TOKEN}' | base64 -d
# Output: Hard-To-Guess3
07. Mount Secrets & ConfigMaps as volumes¶
Besides environment variables, you can mount Secrets and ConfigMaps as files inside the container. This is useful for configuration files, certificates, or any data that should appear as files.
1. Create a ConfigMap with a configuration file¶
# Create a ConfigMap from a literal that will be mounted as a file
kubectl create configmap app-config \
-n codewizard \
--from-literal=app.properties="server.port=5000
server.language=English
feature.debug=true"
2. Mount the ConfigMap as a volume¶
Add this to your deployment spec (the full file is shown for clarity):
apiVersion: apps/v1
kind: Deployment
metadata:
name: codewizard-secrets
namespace: codewizard
spec:
replicas: 1
selector:
matchLabels:
name: codewizard-secrets
template:
metadata:
labels:
name: codewizard-secrets
spec:
containers:
- name: secrets
image: nirgeier/k8s-secrets-sample
imagePullPolicy: Always
ports:
- containerPort: 5000
env:
- name: LANGUAGE
valueFrom:
configMapKeyRef:
name: language
key: LANGUAGE
- name: TOKEN
valueFrom:
secretKeyRef:
name: token
key: TOKEN
# Mount the ConfigMap as a file
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
- name: secret-volume
mountPath: /etc/secrets
readOnly: true
resources:
limits:
cpu: "500m"
memory: "256Mi"
volumes:
- name: config-volume
configMap:
name: app-config
- name: secret-volume
secret:
secretName: token
3. Verify the mounted files¶
# Exec into the pod and check the mounted files
POD=$(kubectl get pod -n codewizard -l name=codewizard-secrets -o jsonpath='{.items[0].metadata.name}')
# View secret and config files
kubectl exec -it -n codewizard "$POD" -- sh -c \
"echo '--- ConfigMap file ---'; \
cat /etc/config/app.properties; \
echo; \
echo '--- Secret file ---'; \
cat /etc/secrets/TOKEN"
Volume Mounts vs Environment Variables
| Feature | Environment Variables | Volume Mounts |
|---|---|---|
| Update method | Pod restart required | Auto-updated (with delay) |
| Best for | Simple key-value pairs | Config files, certificates |
| File format | N/A | Each key becomes a file |
08. Updating Secrets & ConfigMaps¶
Important
Pods do not automatically restart when Secrets or ConfigMaps change.
- Environment variables: Require a pod restart to pick up new values
- Volume mounts: Are eventually updated automatically (kubelet sync period, typically ~60s)
1. Update an existing Secret¶
# Use dry-run + replace to update an existing secret
kubectl create secret generic token \
-n codewizard \
--from-literal=TOKEN=NewToken123 \
-o yaml --dry-run=client | kubectl replace -f -
2. Restart the pods to pick up the changes¶
# Rolling restart - zero downtime
kubectl rollout restart deployment/codewizard-secrets -n codewizard
# Wait for rollout to complete
kubectl rollout status deployment/codewizard-secrets -n codewizard
3. Verify the new values¶
kubectl exec -it -n codewizard \
$(kubectl get pod -n codewizard -l name=codewizard-secrets -o jsonpath='{.items[0].metadata.name}') \
-- sh -c "curl -s localhost:5000"
# Expected response:
# Language: English
# Token : NewToken123
09. Immutable Secrets & ConfigMaps¶
Starting from Kubernetes v1.21, you can mark Secrets and ConfigMaps as immutable. This prevents accidental (or malicious) modifications and improves cluster performance.
apiVersion: v1
kind: ConfigMap
metadata:
name: stable-config
data:
VERSION: "1.0"
immutable: true # <-- Cannot be changed once created
# Once applied, attempting to modify this ConfigMap will fail:
# error: configmaps "stable-config" is immutable
When to use immutable resources
- Application configuration that should never change after deployment
- Certificates or credentials tied to a specific release
- Improves performance: kubelet skips watching for updates on immutable resources
10. Cleanup¶
Summary¶
| Concept | Description |
|---|---|
| Secret | Stores sensitive data as Base64-encoded key-value pairs |
| ConfigMap | Stores non-sensitive configuration as plain key-value pairs |
| Imperative creation | kubectl create secret/configmap - quick for testing |
| Declarative creation | YAML files with data: / stringData: - version-controlled |
| Env injection | valueFrom.secretKeyRef / valueFrom.configMapKeyRef |
| Volume mount | Mount as files inside the pod - auto-updates for volume mounts |
| Immutable | immutable: true - prevents changes, improves performance |
| Updating | Use dry-run=client + replace, then rollout restart for env vars |
Key Takeaways¶
- Never hardcode sensitive values in Deployment YAML files
- Secrets are not encrypted by default - they are only Base64-encoded
- ConfigMaps are for non-sensitive data; Secrets are for sensitive data
- Volume-mounted ConfigMaps/Secrets auto-update; env vars require pod restart
- Use immutable resources when values should never change after deployment
- In production, consider using external secret management tools (Vault, Sealed Secrets, etc.)
—¶
Part 2: Advanced Secrets Management - Sealed Secrets & External Secrets Operator¶
The Problem: Why Basic Secrets Are Not Enough¶
graph LR
subgraph problem["β The Problem"]
secret["K8S Secret\n(base64 encoded)"]
git["Stored in Git\n(anyone can decode)"]
etcd["Stored in etcd\n(unencrypted by default)"]
end
subgraph solution["β
The Solutions"]
sealed["Sealed Secrets\n(asymmetric encryption)"]
eso["External Secrets Operator\n(sync from vault)"]
end
secret --> git
secret --> etcd
sealed --> |"Safe to commit"| git
eso --> |"Never in Git"| etcd
# Proof: base64 is NOT encryption
echo "my-super-secret-password" | base64
# bXktc3VwZXItc2VjcmV0LXBhc3N3b3JkCg==
echo "bXktc3VwZXItc2VjcmV0LXBhc3N3b3JkCg==" | base64 -d
# my-super-secret-password
Base64 β Encryption
Kubernetes Secrets are only base64-encoded, not encrypted. Anyone with kubectl get secret -o yaml access can read them. This is the #1 misunderstanding in Kubernetes security.
Sealed Secrets¶
How Sealed Secrets Work¶
sequenceDiagram
participant Dev as Developer
participant KS as kubeseal CLI
participant Ctrl as Sealed Secrets Controller
participant K8S as Kubernetes API
Dev->>KS: Create Secret YAML
KS->>Ctrl: Fetch public key
KS->>Dev: Encrypted SealedSecret YAML
Dev->>Dev: Commit SealedSecret to Git β
Note over Dev: Safe - only the controller<br/>can decrypt
Dev->>K8S: kubectl apply SealedSecret
K8S->>Ctrl: SealedSecret created
Ctrl->>K8S: Decrypts β creates real Secret
08. Install Sealed Secrets Controller¶
# Add the Sealed Secrets Helm repo
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm repo update
# Install the controller in kube-system
helm install sealed-secrets sealed-secrets/sealed-secrets \
--namespace kube-system \
--set fullnameOverride=sealed-secrets-controller
# Wait for it to be ready
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=sealed-secrets \
-n kube-system --timeout=120s
09. Install kubeseal CLI¶
KUBESEAL_VERSION=$(curl -s https://api.github.com/repos/bitnami-labs/sealed-secrets/releases/latest | grep tag_name | cut -d '"' -f4 | cut -c2-)
curl -OL "https://github.com/bitnami-labs/sealed-secrets/releases/download/v${KUBESEAL_VERSION}/kubeseal-${KUBESEAL_VERSION}-linux-amd64.tar.gz"
tar -xvzf kubeseal-${KUBESEAL_VERSION}-linux-amd64.tar.gz kubeseal
sudo install -m 755 kubeseal /usr/local/bin/kubeseal
rm kubeseal kubeseal-${KUBESEAL_VERSION}-linux-amd64.tar.gz
10. Create a SealedSecret¶
# Create the namespace
kubectl create namespace secrets-lab --dry-run=client -o yaml | kubectl apply -f -
# Step 1: Create a regular Secret (don't apply it!)
kubectl create secret generic db-credentials \
--namespace secrets-lab \
--from-literal=username=admin \
--from-literal=password=S3cur3P@ssw0rd \
--dry-run=client -o yaml > /tmp/db-secret.yaml
# Step 2: Seal it with kubeseal
kubeseal --format yaml < /tmp/db-secret.yaml > resources/sealed-db-credentials.yaml
# Step 3: Clean up the plaintext secret
rm /tmp/db-secret.yaml
# Step 4: View the sealed secret (safe to commit to Git!)
cat resources/sealed-db-credentials.yaml
GitOps-Safe
The resulting SealedSecret YAML contains encrypted data that can only be decrypted by the Sealed Secrets controller running in your cluster. It is safe to commit to Git.
11. Apply the SealedSecret¶
# Apply the SealedSecret
kubectl apply -f resources/sealed-db-credentials.yaml
# The controller automatically creates a real Secret
kubectl get secret db-credentials -n secrets-lab
# NAME TYPE DATA AGE
# db-credentials Opaque 2 5s
# Verify the decrypted values
kubectl get secret db-credentials -n secrets-lab -o jsonpath='{.data.username}' | base64 -d
# admin
kubectl get secret db-credentials -n secrets-lab -o jsonpath='{.data.password}' | base64 -d
# S3cur3P@ssw0rd
12. Use the SealedSecret in a Pod¶
- Source file: resources/pod-with-sealed-secret.yaml
# resources/pod-with-sealed-secret.yaml
apiVersion: v1
kind: Pod
metadata:
name: secret-consumer
namespace: secrets-lab
spec:
containers:
- name: app
image: busybox:latest
command: ["sh", "-c", "echo Username=$DB_USER Password=$DB_PASS && sleep 3600"]
env:
- name: DB_USER
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: DB_PASS
valueFrom:
secretKeyRef:
name: db-credentials
key: password
kubectl apply -f resources/pod-with-sealed-secret.yaml
kubectl wait --for=condition=Ready pod/secret-consumer -n secrets-lab --timeout=60s
# Check the logs to see the injected values
kubectl logs secret-consumer -n secrets-lab
# Username=admin Password=S3cur3P@ssw0rd
External Secrets Operator (ESO)¶
The External Secrets Operator syncs secrets from external providers (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault, etc.) into Kubernetes Secrets.
How ESO Works¶
graph LR
subgraph external["External Provider"]
vault["HashiCorp Vault\nAWS Secrets Manager\nGCP Secret Manager\nAzure Key Vault"]
end
subgraph cluster["Kubernetes Cluster"]
ss["SecretStore /\nClusterSecretStore"]
es["ExternalSecret"]
eso["ESO Controller"]
secret["Kubernetes Secret\n(auto-created)"]
end
vault --> ss
ss --> eso
es --> eso
eso --> secret
| CRD | Purpose |
|---|---|
SecretStore |
Defines connection to an external provider (namespace-scoped) |
ClusterSecretStore |
Same as SecretStore but cluster-scoped |
ExternalSecret |
Declares which secrets to fetch and how to map them |
13. Install External Secrets Operator¶
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
helm install external-secrets external-secrets/external-secrets \
--namespace external-secrets \
--create-namespace \
--set installCRDs=true
# Wait for it
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=external-secrets \
-n external-secrets --timeout=120s
14. Use ESO with a Kubernetes Secret Store (for learning)¶
For this lab, we’ll use the Kubernetes provider - ESO reads from a Secret in one namespace and syncs it to another. In production, you’d replace this with Vault, AWS, etc.
# Create a "source" secret in a secured namespace (simulating an external provider)
kubectl create namespace secret-store
kubectl create secret generic app-secrets \
--namespace secret-store \
--from-literal=api-key=my-api-key-12345 \
--from-literal=api-secret=super-secret-value
Create a SecretStore pointing to the Kubernetes provider:
- Source file: resources/secret-store.yaml
# resources/secret-store.yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: k8s-secret-store
namespace: secrets-lab
spec:
provider:
kubernetes:
remoteNamespace: secret-store
server:
caProvider:
type: ConfigMap
name: kube-root-ca.crt
key: ca.crt
auth:
serviceAccount:
name: eso-reader
# Create the ServiceAccount for ESO
kubectl create serviceaccount eso-reader -n secrets-lab
# Grant it permission to read secrets in the source namespace
kubectl create role secret-reader \
--namespace secret-store \
--verb=get,list,watch \
--resource=secrets
kubectl create rolebinding eso-secret-reader \
--namespace secret-store \
--role=secret-reader \
--serviceaccount=secrets-lab:eso-reader
kubectl apply -f resources/secret-store.yaml
15. Create an ExternalSecret¶
- Source file: resources/external-secret.yaml
# resources/external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-credentials
namespace: secrets-lab
spec:
refreshInterval: 1h # How often to sync
secretStoreRef:
name: k8s-secret-store
kind: SecretStore
target:
name: app-credentials # Name of the K8S Secret to create
creationPolicy: Owner
data:
- secretKey: API_KEY # Key in the target Secret
remoteRef:
key: app-secrets # Name of the source Secret
property: api-key # Key in the source Secret
- secretKey: API_SECRET
remoteRef:
key: app-secrets
property: api-secret
kubectl apply -f resources/external-secret.yaml
# Check the ExternalSecret status
kubectl get externalsecret -n secrets-lab
# NAME STORE REFRESH INTERVAL STATUS
# app-credentials k8s-secret-store 1h SecretSynced
# The Secret was automatically created!
kubectl get secret app-credentials -n secrets-lab
kubectl get secret app-credentials -n secrets-lab -o jsonpath='{.data.API_KEY}' | base64 -d
# my-api-key-12345
16. Cleanup (Part 2)¶
# Uninstall Sealed Secrets
helm uninstall sealed-secrets -n kube-system
# Uninstall External Secrets
helm uninstall external-secrets -n external-secrets
# Delete namespaces
kubectl delete namespace secrets-lab secret-store external-secrets --ignore-not-found
Part 2 Summary¶
| Approach | Best For | Git-Safe? | External Provider? |
|---|---|---|---|
| Plain K8S Secrets | Development/testing only | β No | No |
| Sealed Secrets | GitOps - encrypt secrets for Git storage | β Yes | No |
| External Secrets (ESO) | Production - centralized secret management | β Yes | Yes (Vault, AWSβ¦) |
| Concept | Key Takeaway |
|---|---|
| base64 β encryption | K8S Secrets are encoded, not encrypted |
| Sealed Secrets | Encrypt + commit to Git; controller decrypts in-cluster |
| kubeseal CLI | Encrypts secrets using the controller’s public key |
| External Secrets Operator | Syncs secrets from external vaults into K8S |
| SecretStore | Defines connection to external provider |
| ExternalSecret | Declares what to fetch and where to put it |
| refreshInterval | ESO periodically re-syncs - secrets stay up-to-date |
Exercises¶
The following exercises will test your understanding of Kubernetes secret management tools. Try to solve each exercise on your own before revealing the solution.
01. Seal a Secret and Verify It Cannot Be Decoded Without the Controller¶
Create a regular Secret, seal it with kubeseal, and then inspect the SealedSecret output. Verify the encrypted data cannot be decoded with base64 --decode.
Scenario:¶
β¦ You want to store credentials in Git but need to prove the encryption is real. β¦ You need to show that the SealedSecret is not just base64 - it’s truly encrypted.
Hint: Create a Secret with --dry-run=client -o yaml, pipe to kubeseal --format yaml, then try to base64-decode the encryptedData values.
Solution
## Ensure the Sealed Secrets controller is installed
## (see step 08 in this lab)
## Create a regular Secret (don't apply it)
kubectl create secret generic test-sealed \
--namespace secrets-lab \
--from-literal=api-key=my-secret-api-key-123 \
--dry-run=client -o yaml > /tmp/test-secret.yaml
## View the regular Secret - base64 encoded but easily decoded
cat /tmp/test-secret.yaml
kubectl create secret generic test-sealed \
--from-literal=api-key=my-secret-api-key-123 \
--dry-run=client -o jsonpath='{.data.api-key}' | base64 -d
echo ## Output: my-secret-api-key-123
## Seal it
kubeseal --format yaml < /tmp/test-secret.yaml > /tmp/sealed-test.yaml
## View the SealedSecret - encrypted data
cat /tmp/sealed-test.yaml
## Try to base64-decode the encryptedData (will produce binary garbage, not readable)
grep "api-key:" /tmp/sealed-test.yaml | awk '{print $2}' | base64 -d 2>&1 || echo "Cannot decode - it's encrypted, not just encoded!"
## Clean up
rm /tmp/test-secret.yaml /tmp/sealed-test.yaml
02. Rotate a Sealed Secret¶
Update a SealedSecret by creating a new version with an updated password, apply it, and verify the controller updates the real Secret.
Scenario:¶
β¦ A database password has been rotated and you need to update the SealedSecret in Git. β¦ When the new SealedSecret is applied, the controller should update the real Secret automatically.
Hint: Create a new Secret with the updated value, seal it again, and kubectl apply the new SealedSecret.
Solution
## Create the original SealedSecret
kubectl create secret generic rotate-demo \
--namespace secrets-lab \
--from-literal=password=old-password-v1 \
--dry-run=client -o yaml | kubeseal --format yaml | kubectl apply -f -
## Verify the original Secret was created
kubectl get secret rotate-demo -n secrets-lab -o jsonpath='{.data.password}' | base64 -d
echo ## Output: old-password-v1
## Create a NEW SealedSecret with the rotated password
kubectl create secret generic rotate-demo \
--namespace secrets-lab \
--from-literal=password=new-password-v2 \
--dry-run=client -o yaml | kubeseal --format yaml | kubectl apply -f -
## Verify the Secret was updated
sleep 5
kubectl get secret rotate-demo -n secrets-lab -o jsonpath='{.data.password}' | base64 -d
echo ## Output: new-password-v2
## Clean up
kubectl delete sealedsecret rotate-demo -n secrets-lab 2>/dev/null || true
kubectl delete secret rotate-demo -n secrets-lab
03. Create a ClusterSecretStore and Use It Across Namespaces¶
Set up a ClusterSecretStore (using the Kubernetes provider) and create ExternalSecrets in two different namespaces that reference the same store.
Scenario:¶
β¦ Multiple teams need to access the same external secrets provider. β¦ Instead of creating a SecretStore in each namespace, you want a single cluster-wide store.
Hint: Use kind: ClusterSecretStore and reference it with kind: ClusterSecretStore in the ExternalSecret’s secretStoreRef.
Solution
## Create source namespace with a shared Secret
kubectl create namespace shared-secrets
kubectl create secret generic shared-creds \
--namespace shared-secrets \
--from-literal=db-password=shared-db-pass-123
## Create a service account for ESO
kubectl create serviceaccount eso-cluster-reader -n external-secrets
## Grant permissions to read from the source namespace
kubectl create role secret-reader \
--namespace shared-secrets \
--verb=get,list,watch \
--resource=secrets
kubectl create rolebinding eso-cluster-reader-binding \
--namespace shared-secrets \
--role=secret-reader \
--serviceaccount=external-secrets:eso-cluster-reader
## Create the ClusterSecretStore
cat <<'EOF' | kubectl apply -f -
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: global-k8s-store
spec:
provider:
kubernetes:
remoteNamespace: shared-secrets
server:
caProvider:
type: ConfigMap
name: kube-root-ca.crt
namespace: external-secrets
key: ca.crt
auth:
serviceAccount:
name: eso-cluster-reader
namespace: external-secrets
EOF
## Create ExternalSecrets in two namespaces
kubectl create namespace team-alpha
kubectl create namespace team-beta
for ns in team-alpha team-beta; do
cat <<EOF | kubectl apply -f -
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-creds
namespace: $ns
spec:
refreshInterval: 1h
secretStoreRef:
name: global-k8s-store
kind: ClusterSecretStore
target:
name: db-credentials
creationPolicy: Owner
data:
- secretKey: DB_PASSWORD
remoteRef:
key: shared-creds
property: db-password
EOF
done
## Verify Secrets in both namespaces
kubectl get secret db-credentials -n team-alpha
kubectl get secret db-credentials -n team-beta
## Clean up
kubectl delete namespace team-alpha team-beta shared-secrets
04. Back Up and Restore Sealed Secrets Controller Keys¶
Export the Sealed Secrets controller’s encryption keys (for backup/disaster recovery) and verify you understand the key management lifecycle.
Scenario:¶
β¦ You need to back up the Sealed Secrets controller’s private key for disaster recovery. β¦ If the controller is reinstalled without the backup, existing SealedSecrets cannot be decrypted.
Hint: The controller’s keys are stored as Secrets in the kube-system namespace with a label sealedsecrets.bitnami.com/sealed-secrets-key.
Solution
## List the controller's encryption keys
kubectl get secret -n kube-system -l sealedsecrets.bitnami.com/sealed-secrets-key
## Back up the key(s) to a file (KEEP THIS SECURE!)
kubectl get secret -n kube-system \
-l sealedsecrets.bitnami.com/sealed-secrets-key \
-o yaml > /tmp/sealed-secrets-backup.yaml
## The backup contains the private key - treat it with extreme care
echo "Backup saved. This file contains the private key and must be stored securely."
echo "Without this key, existing SealedSecrets cannot be decrypted after controller reinstall."
## To restore after reinstalling the controller:
## kubectl apply -f /tmp/sealed-secrets-backup.yaml
## kubectl rollout restart deployment/sealed-secrets-controller -n kube-system
## Clean up the backup (in real scenarios, store it in a secure vault)
rm /tmp/sealed-secrets-backup.yaml
Troubleshooting¶
- kubeseal cannot connect to the controller:
Verify the Sealed Secrets controller is running and accessible:
## Check the controller pod
kubectl get pods -n kube-system -l app.kubernetes.io/name=sealed-secrets
## Check the controller service
kubectl get service -n kube-system -l app.kubernetes.io/name=sealed-secrets
## Try fetching the public key manually
kubeseal --fetch-cert
- SealedSecret is not creating a Secret:
Check the SealedSecret status and controller logs:
## Check the SealedSecret status
kubectl get sealedsecret -n secrets-lab
kubectl describe sealedsecret <name> -n secrets-lab
## Check controller logs for errors
kubectl logs -n kube-system -l app.kubernetes.io/name=sealed-secrets --tail=50
- ExternalSecret status shows error:
Verify the SecretStore connection and permissions:
## Check the ExternalSecret status
kubectl describe externalsecret <name> -n secrets-lab
## Check the SecretStore status
kubectl describe secretstore k8s-secret-store -n secrets-lab
## Verify the ServiceAccount has correct RBAC
kubectl auth can-i get secrets -n secret-store \
--as system:serviceaccount:secrets-lab:eso-reader
Kustomization - kubectl kustomize¶
Kustomizeis a very powerful tool for customizing and building Kubernetes resources.Kustomizestarted in 2017, and was added tokubectlsince version 1.14.Kustomizehas many useful features for managing and deploying resources.- When you execute a Kustomization beside using the builtin features, it will also re-order the resources in a logical way for K8S to deploy.
What will we learn?¶
- How Kustomize re-orders Kubernetes resources
- Common Kustomize features: annotations, labels, generators, images, namespaces, prefixes/suffixes, replicas
- How to use ConfigMap and Secret generators
- How to use patches to modify resources
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster (v1.14+)
Declarative Configuration in Kubernetes¶
01. Re-order the resources¶
-
Kustomizationre-orders theKindfor optimization. For this demo, we will need an existingnamespacebefore using it. -
The order of the resources is defined in the source code
// An attempt to order things to help k8s, e.g.
// - Namespace should be first.
// - Service should come before things that refer to it.
// In some cases order just specified to provide determinism.
var orderFirst = []string{
"Namespace",
"ResourceQuota",
"StorageClass",
"CustomResourceDefinition",
"ServiceAccount",
"PodSecurityPolicy",
"Role",
"ClusterRole",
"RoleBinding",
"ClusterRoleBinding",
"ConfigMap",
"Secret",
"Endpoints",
"Service",
"LimitRange",
"PriorityClass",
"PersistentVolume",
"PersistentVolumeClaim",
"Deployment",
"StatefulSet",
"CronJob",
"PodDisruptionBudget",
}
var orderLast = []string{
"MutatingWebhookConfiguration",
"ValidatingWebhookConfiguration",
}
02. Base resource for our demo¶
- In the following samples we will refer to the following
base.yamlfile:
# base.yaml
# This is the base file for all the demos in this folder
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: __image__
03. Common Features¶
- common Annotation
- common Labels
- Generators
- Config Map Generator
- Secret Generator
- images
- Namespaces
- Prefix / Suffix
- Replicas
- Patches
commonAnnotation¶
### FileName: kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# This will add annotation under every metadata entry
# ex: main metadata, spec.metadata etc
commonAnnotations:
author: nirgeier@gmail.com
- Output:
### commonAnnotation output
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
### Annotation added here
author: nirgeier@gmail.com
name: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
### Annotation added here
annotations:
author: nirgeier@gmail.com
labels:
app: myapp
spec:
containers:
- image: __image__
name: myapp
commonLabels¶
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# This will add annotation under every metadata entry
# ex: main metadata, spec.metadata etc
commonLabels:
author: nirgeier@gmail.com
env: codeWizard-cluster
bases:
- ../_base
- Output:
apiVersion: apps/v1
kind: Deployment
metadata:
# Labels added ....
labels:
author: nirgeier@gmail.com
env: codeWizard-cluster
name: myapp
spec:
selector:
matchLabels:
app: myapp
# Labels added ....
author: nirgeier@gmail.com
env: codeWizard-cluster
template:
metadata:
labels:
app: myapp
# Labels added ....
author: nirgeier@gmail.com
env: codeWizard-cluster
spec:
containers:
- image: __image__
name: myapp
Generators¶
- Kustomization also support generate
ConfigMap/Secretin several ways. - The default behavior is adding the output hash value as suffix to the name, e.g.:
secretMapFromFile-495dtcb64g
apiVersion: v1
data:
APP_ENV: ZGV2ZWxvcG1lbnQ=
LOG_DEBUG: dHJ1ZQ==
NODE_ENV: ZGV2
REGION: d2V1
kind: Secret
metadata:
name: secretMapFromFile-495dtcb64g # <--------------------------
type: Opaque
- We can disable the suffix with the following addition to the
kustomization.yaml
configMapGenerator¶
From Env¶
-
.env-
kustomization.yaml -
The output of
configMapFromEnv:
-
From File¶
-
.env-
kustomization.yaml -
The output of
configMapFromEnv:
-
From Literal¶
-
.env-
kustomization.yaml -
The output of
configMapFromEnv:
-
Secret Generator¶
# Similar to configMap but with an additional type field
secretGenerator:
# Generate secret from env file
- name: secretMapFromFile
env: .env
type: Opaque
generatorOptions:
disableNameSuffixHash: true
images¶
- Modify the name, tags and/or digest for images.
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ./base.yaml
images:
# The image as its defined in the Deployment file
- name: __image__
# The new name to set
newName: my-registry/my-image
# optional: image tag
newTag: v1
- Output:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
# --- This image was updated
- image: my-registry/my-image:v1
name: myapp
Namespaces¶
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# Add the desired namespace to all resources
namespace: kustomize-namespace
bases:
- ../_base
- Output:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
# Namespace added here
namespace: kustomize-namespace
Prefix-suffix¶
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# Add the desired Prefix to all resources
namePrefix: prefix-codeWizard-
nameSuffix: -suffix-codeWizard
bases:
- ../_base
- Output:
Replicas¶
- deployment
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
replicas: 5
selector:
name: deployment
template:
containers:
- name: container
image: registry/conatiner:latest
- kustomization
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
replicas:
- name: deployment
count: 10
resources:
- deployment.yaml
- Output:
Note
There is a bug with the replicas entries which return error for some reason.
kubectl kustomize .
# For some reason we get this error:
Error: json: unknown field "replicas"
# Workaround for this error for now is:
kustomize build .
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
replicas: 10
selector:
name: deployment
template:
containers:
- image: registry/conatiner:latest
name: container
Patches¶
- There are several types of patches like [
replace,delete,patchesStrategicMerge] - For this demo we will demonstrate
patchesStrategicMerge
Patch Add/Update¶
# File: patch-memory.yaml
# -----------------------
# Patch limits.memory
apiVersion: apps/v1
kind: Deployment
# Set the desired deployment to patch
metadata:
name: myapp
spec:
# patch the memory limit
template:
spec:
containers:
- name: patch-name
resources:
limits:
memory: 512Mi
# File: patch-replicas.yaml
# -------------------------
apiVersion: apps/v1
kind: Deployment
# Set the desired deployment to patch
metadata:
name: myapp
spec:
# This is the patch for this demo
replicas: 3
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../_base
patchesStrategicMerge:
- patch-memory.yaml
- patch-replicas.yaml
- Output:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
# This is the first patch
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
# This is the second patch
containers:
- name: patch-name
resources:
limits:
memory: 512Mi
- image: __image__
name: myapp
Patch-Delete¶
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../_base
patchesStrategicMerge:
- patch-delete.yaml
# patch-delete.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
# Remove this section, in this demo it will remove the
# image with the `name: myapp`
- $patch: delete
name: myapp
image: __image__
- Output:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: nginx
name: nginx
Patch Replace¶
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../_base
patchesStrategicMerge:
- patch-replace.yaml
# patch-replace.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
# Remove this section, in this demo it will remove the
# image with the `name: myapp`
- $patch: replace
- name: myapp
image: nginx:latest
args:
- one
- two
- Output:
StatefulSets¶

- In this lab we will learn about
StatefulSetsin Kubernetes and how they differ fromDeployments. - We will deploy a PostgreSQL database as a StatefulSet and verify that data persists across pod restarts.
What will we learn?¶
- The difference between stateless and stateful applications
- How
StatefulSetsmaintain sticky identities and stable storage - How to deploy a PostgreSQL database as a StatefulSet
- How to verify data persistence across pod restarts and scaling
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the clusterpsqlclient installed (for testing PostgreSQL)
Introduction¶
The Difference Between a Statefulset And a Deployment¶
Stateless application¶
- A stateless application is one that does not care which network it is using, and it does not need permanent storage and can be scaled up and down without the need to re-use the same network or persistence.
- Deployment is the suitable kind for Stateless applications.
- The most trivial example of stateless app is a
Web Server.
Stateful application¶
- Stateful applications are apps which in order to work properly need to use the same resources, such as network, storage etc.
- Usually with
Statefulapplications you will need to ensure that pods can reach each other through a unique identity that does not change (e.g., hostnames, IP). - The most trivial example of Stateful app is a database of any kind.
Stateful Notes
- Like a Deployment, a
StatefulSetmanages Pods that are based on an identical container spec. - Unlike a Deployment, a
StatefulSetmaintains a sticky identity for each of their Pods. - These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
- Deleting and/or scaling down a
StatefulSetwill not delete the volumes associated with theStatefulSet. This is done to ensure data safety. StatefulSetkeeps a unique identity for each Pod and assign the same identity to those pods when they are rescheduled (update, restart etc).- The storage for a given Pod must either be provisioned by a
PersistentVolumeprovisioner, based on the requested storage class, or pre-provisioned by an admin. StatefulSetmanages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods.- A
statefulapp needs to use a dedicated storage.
Stable Network Identity¶
- A
Statefulapplication node must have a unique hostname and IP address so that other nodes in the same application know how to reach it. - A
ReplicaSetassign a random hostname and IP address to each Pod. In such a case, we must use a service which exposes those Pods for us.
Start and Termination Order¶
- Each
StatefulSetfollows this naming pattern:$(statefulSet name)-$(ordinal) Statefulapplications restarted or re-created, following the creation order.- A
ReplicaSetdoes not follow a specific order when starting or killing its pods.
StatefulSet Volumes¶
StatefulSetdoes not create a volume for you.- When a
StatefulSetis deleted, the respective volumes are not deleted with it.
To address all these requirements, Kubernetes offers the StatefulSet primitive.¶
01. Create namespace and clear previous data if there is any¶
# If the namespace already exist and contains data form previous steps, lets clean it
kubectl delete namespace codewizard
# Create the desired namespace [codewizard]
kubectl create namespace codewizard
namespace/codewizard created
02. Create and test the Stateful application¶
-
In order to deploy the Stateful set we will need the following resources:
-
ConfigMap ServiceStatefulSet-
PersistentVolumeClaim orPersistentVolume` -
All the resources including
kustomizationscript are defined inside the base folder
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
labels:
app: postgres
data:
# The following names are the one defined in the officail postgres docs
# The name of the database we will use in this demo
POSTGRES_DB: codewizard
# the user name for this demo
POSTGRES_USER: codewizard
# The password for this demo
POSTGRES_PASSWORD: admin123
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: postgres-pv-claim
labels:
app: postgres
spec:
# in this demo we use GCP so we are using the 'standard' StorageClass
# We can of course define our own StorageClass resource
storageClassName: standard
# The access modes are:
# ReadWriteOnce - The volume can be mounted as read-write by a single node
# ReadWriteMany - The volume can be mounted as read-write by a many node
# ReadOnlyMany - The volume can be mounted as read-only by many nodes
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
replicas: 1
# StatefulSet must contain a serviceName
serviceName: postgres
selector:
matchLabels:
app: postgres # has to match .spec.template.metadata.labels
template:
metadata:
labels:
app: postgres # has to match .spec.selector.matchLabels
spec:
containers:
- name: postgres
image: postgres:10.4
imagePullPolicy: "IfNotPresent"
# The default DB port
ports:
- containerPort: 5432
# Load the required configuration env values form the configMap
envFrom:
- configMapRef:
name: postgres-config
# Use volume for storage
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgredb
# We can use PersistentVolume or PersistentVolumeClaim.
# In this sample we are useing PersistentVolumeClaim
volumes:
- name: postgredb
persistentVolumeClaim:
# reference to Pre-Define PVC
claimName: postgres-pv-claim
Note: You can use the kustomization file to create or apply all the above resources
# Generate and apply the required resources using kustomization
kubectl kustomize PostgreSQL/ | kubectl apply -f -
03. Test the Stateful application¶
- Use the - testDB.sh to test the StatefulSet
- Don’t forget to set the execution flag
chmod +x testDb.shif required
### Test to see if the StatefulSet "saves" the state of the pods
# Programmatically get the port and the IP
export CLUSTER_IP=$(kubectl get nodes \
--selector=node-role.kubernetes.io/control-plane \
-o jsonpath='{$.items[*].status.addresses[?(@.type=="InternalIP")].address}')
export NODE_PORT=$(kubectl get \
services postgres \
-o jsonpath="{.spec.ports[0].nodePort}" \
-n codewizard)
export POSTGRES_DB=$(kubectl get \
configmap postgres-config \
-o jsonpath='{.data.POSTGRES_DB}' \
-n codewizard)
export POSTGRES_USER=$(kubectl get \
configmap postgres-config \
-o jsonpath='{.data.POSTGRES_USER}' \
-n codewizard)
export PGPASSWORD=$(kubectl get \
configmap postgres-config \
-o jsonpath='{.data.POSTGRES_PASSWORD}' \
-n codewizard)
# Check to see if we have all the required variables
printenv | grep POST*
# Connect to postgres and create table if required.
# Once the table exists - add row into the table
# you can run this command as amny times as you like
psql \
-U ${POSTGRES_USER} \
-h ${CLUSTER_IP} \
-d ${POSTGRES_DB} \
-p ${NODE_PORT} \
-c "CREATE TABLE IF NOT EXISTS stateful (str VARCHAR); INSERT INTO stateful values (1); SELECT count(*) FROM stateful"
04. Scale down the StatefulSet and check that its down¶
04.01. Scale down the Statefulset to 0¶
04.02. Verify that the pods Terminated¶
# Wait until the pods will be terminated
kubectl get pods -n codewizard --watch
NAME READY STATUS RESTARTS AGE
postgres-0 1/1 Running 0 32m
postgres-0 1/1 Terminating 0 32m
postgres-0 0/1 Terminating 0 32m
postgres-0 0/1 Terminating 0 33m
postgres-0 0/1 Terminating 0 33m
04.03. Verify that the DB is not reachable¶
- If the DB is not reachable it mean that all the pods are down
psql \
-U ${POSTGRES_USER} \
-h ${CLUSTER_IP} \
-d ${POSTGRES_DB} \
-p ${NODE_PORT} \
-c "SELECT count(*) FROM stateful"
# You should get output similar to this one:
psql: error: could not connect to server: Connection refused
Is the server running on host "192.168.49.2" and accepting
TCP/IP connections on port 32570?
05. Scale up again and verify that we still have the prevoius data¶
05.01. scale up the Statefulset to 1 or more¶
05.02. Verify that the pods is in Running status¶
05.03. Verify that the pods is using the previous data¶
WordPress, MySQL, PVC¶
- In this lab you will deploy a WordPress site and a MySQL database.
- You will use
PersistentVolumesandPersistentVolumeClaimsas storage.
What will we learn?¶
- How to deploy a multi-tier application (WordPress + MySQL) on Kubernetes
- How to use
PersistentVolumeClaimsfor persistent storage - How to use
kustomization.yamlwith secret generators - How to use port forwarding to test applications locally
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster- Minikube (for LoadBalancer support)
Walkthrough¶
- Patch
minikubeso we can useService: LoadBalancer
# Source:
# https://github.com/knative/serving/blob/b31d96e03bfa1752031d0bc4ae2a3a00744d6cd5/docs/creating-a-kubernetes-cluster.md#loadbalancer-support-in-minikube
sudo ip route add \
$(cat ~/.minikube/profiles/minikube/config.json | \
jq -r ".KubernetesConfig.ServiceCIDR") \
via $(minikube ip)
kubectl run minikube-lb-patch \
--replicas=1 \
--image=elsonrodriguez/minikube-lb-patch:0.1 \
--namespace=kube-system
- Create the desired
Namespace - Create the
MySQLresources:- Create
Service - Create
PersistentVolumeClaims - Create
Deployment - Create
password file
- Create
- Create the WordPress resources:
- Create
Service - Create
PersistentVolumeClaims - Create
Deployment
- Create
- Create a
kustomization.yamlwith:Secret generatorMySQLresourcesWordPressresources
- Deploy the stack
- Port forward from the host to the application
- We use a port forward so we will be able to test and verify if the WordPress is actually running:
Cleanup¶
Observability
Logging¶
- Welcome to the
Logginghands-on lab! In this tutorial, we will learn the essentials ofLoggingin Kubernetes clusters. - We will deploy a sample application, configure log collection, and explore logs using popular tools like
Fluentd,Elasticsearch, andKibana(EFK stack).
What will we learn?¶
- Why
Loggingis important in Kubernetes - How to deploy a sample app that generates logs
- How to collect logs using Fluentd
- How to store and search logs with
Elasticsearch - How to visualize logs with
Kibana - Troubleshooting and best practices
Introduction¶
Loggingis critical for monitoring, debugging, and auditing applications in Kubernetes.- Kubernetes does not provide a builtin, centralized
Loggingsolution, but it allows us to integrate with manyLoggingstacks. - We will set up the EFK stack (
Elasticsearch,Fluentd,Kibana) to collect, store, and visualize logs from our cluster.
Lab¶
Step 01 - Deploy a Sample Application¶
- Deploy a simple
Nginxapplication that generates access logs.
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort
- Check that the pod is running:
Step 02 - Deploy Elasticsearch¶
- Deploy
ElasticsearchusingHelm:
helm repo add elastic https://helm.elastic.co
helm repo update
helm install elasticsearch elastic/elasticsearch --set replicas=1 --set minimumMasterNodes=1
- Wait for the pod to be ready and check its status:
Step 03 - Deploy Kibana¶
- Deploy
KibanausingHelm:
- Forward the
Kibanaport:
If you are running this lab in Google Cloud Shell:
- After running the port-forward command above, click the Web Preview button in the Cloud Shell toolbar (usually at the top right).
- Enter port
5601when prompted. - This will open
Kibanain a new browser tab at a URL likehttps://<cloudshell-id>.shell.cloud.google.com/?port=5601. - If you see a warning about an untrusted connection, you can safely proceed.
- Access
Kibanaat http://localhost:5601 (if running locally) or via the Cloud Shell Web Preview, as explained above.
Step 04 - Deploy Fluentd¶
- Deploy
Fluentdas aDaemonSetto collect logs from all nodes and forward them toElasticsearch.
kubectl apply -f https://raw.githubusercontent.com/fluent/fluentd-kubernetes-daemonset/master/fluentd-daemonset-elasticsearch-rbac.yaml
- Check that
Fluentdpods are running:
Step 05 - Generate and View Logs¶
- Access the
Nginxservice to generate logs:
In Kibana, configure an index pattern to view logs:
- Open Kibana in your browser (using the Cloud Shell Web Preview as described above).
- In the left menu, click Stack Management > Kibana > Index Patterns.
- Click Create index pattern.
- In the “Index pattern” field, enter
fluentd-*(orlogstash-*if your logs use that prefix). - Click Next step.
- For the time field, select
@timestampand click Create index pattern. - Go to Discover in the left menu to view and search your logs.
Explore the logs, search, and visualize traffic.
Troubleshooting¶
Pods not starting:¶
- Check pod status and logs:
Kibana not reachable:¶
- Ensure port-forward is running and no firewall is blocking port 5601.
No logs in Kibana:¶
- Check Fluentd and Elasticsearch pod logs for errors.
- Ensure index pattern is set up correctly in Kibana.
Cleanup¶
- To remove all resources created by this lab:
helm uninstall elasticsearch
helm uninstall kibana
kubectl delete deployment nginx
kubectl delete service nginx
kubectl delete -f https://raw.githubusercontent.com/fluent/fluentd-kubernetes-daemonset/master/fluentd-daemonset-elasticsearch-rbac.yaml
Next Steps¶
- Try deploying other logging stacks like
Loki+Grafana. - Explore log aggregation, alerting, and retention policies.
- Integrate logging with monitoring and alerting tools.
- Read more in the Kubernetes logging documentation.
Prometheus and Grafana Monitoring Lab¶
- In this lab, we will learn how to set up and configure *
PrometheusandGrafanafor monitoring a Kubernetes cluster. - You will install
Prometheusto collect metrics from the cluster andGrafanato visualize those metrics. - By the end of this lab, you will have a functional monitoring stack that provides insights into the health and performance of your Kubernetes environment.
What will we learn?¶
- How to install Prometheus and Grafana on a Kubernetes cluster
- How to configure Prometheus to collect cluster metrics
- How to set up Grafana dashboards for visualizing metrics
- Monitoring cluster health, application performance, and infrastructure
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the clusterhelminstalled (v3+)
Prometheus and Grafana Setup and Configuration Guide¶
- This guide serves as a comprehensive walkthrough of the steps to set up
PrometheusandGrafanaon your Kubernetes cluster. - It includes hands-on steps for installing
PrometheususingHelm, configuringPrometheusto collect metrics, setting upGrafanato visualize key metrics, and automating the setup using a bash script.
Introduction to Prometheus and Grafana¶
Prometheus¶
Prometheusis an open-source systems monitoring and alerting toolkit designed for reliability and scalability.- It collects and stores metrics as time-series data, providing powerful querying capabilities.
- It is commonly used in Kubernetes environments for monitoring cluster health, application performance, and infrastructure.
Grafana¶
Grafanais a popular open-source data visualization tool that works well withPrometheus.- It allows you to create dashboards and visualize metrics in real-time, providing insights into system performance and application health.
Grafanasupports a wide range of visualization options, includinggraphs,heatmaps,tables, and more.- Together,
PrometheusandGrafanaprovide a powerful stack for monitoring and alerting in Kubernetes.
Part 01 - Installing Prometheus and Grafana¶
Helm Charts
We will use Helm, to deploy Prometheus and Grafana.
Step 01 - Add Prometheus and Grafana Helm Repositories¶
- Let’s add the official
Helmcharts forPrometheusandGrafana:
# Add Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
# Add Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
# Update your Helm repositories to make sure they are up-to-date
helm repo update
Step 02 - Install Prometheus Stack¶
Prometheusis installed using theprometheus-stackHelmchart.
# Install Prometheus
# Alertmanager
# Node Exporter
# Create the `monitoring` namespace if it does not exist.
helm install prometheus \
--namespace monitoring \
--create-namespace \
prometheus-community/kube-prometheus-stack
# Verify the status of the release using the following:
helm status prometheus -n monitoring
Step 03 - Install Grafana¶
- Now, let’s install
Grafana. Grafanawill be deployed in the samemonitoringnamespace.
helm install grafana grafana/grafana --namespace monitoring
# Verify the status of the release using the following:
helm status grafana -n monitoring
Step 04 - Access Grafana¶
Grafanawill expose a service in your Kubernetes cluster.- To access it, you need a password and port forwarding.
# In order to get the Grafana admin password, run the following command:
kubectl get secret grafana \
--namespace monitoring \
-o jsonpath='{.data.admin-password}' | base64 --decode ; echo
# Set the port forwarding so you can access the service using your browsers
kubectl port-forward \
--namespace monitoring \
service/grafana 3000:80
- Verify that you can access
**Grafana - Open your browser and navigate to http://localhost:3000
- The default login is:
- Username :
admin - Password : (the password you retrieved earlier)
- Username :
Accessing Grafana on Google Cloud Shell
If you are running your cluster in Google Cloud Shell, you cannot use localhost for port forwarding. Instead, use the Cloud Shell Web Preview:
- Run the port-forward command as usual:
- In Google Cloud Shell, click the “Web Preview” button (top right) and select “Preview on port 3000”.
- Grafana will open in a new browser tab.
- Username:
admin - Password: (the password you retrieved earlier)
Note: You can use any available port (e.g., 3000, 3001) in the port-forward command, just match it in the Web Preview.
Part 02 - Configuring Prometheus¶
Prometheuscan collect various metrics from your Kubernetes cluster automatically if the right exporters are enabled.- The kube-prometheus-stack chart that you installed earlier automatically configures
Prometheusto scrape a number of Kubernetes components (likekubelet,node-exporter, andkube-state-metrics) for various metrics.
Step 01 - Verify Prometheus Metrics Collection¶
- You can check if
Prometheusis correctly scraping metrics by navigating toPrometheus’ web UI.
# Port-forward the Prometheus service:
kubectl port-forward \
--namespace monitoring \
svc/prometheus-operated 9090:9090
- Verify that you can access
Prometheus - Open http://localhost:9090
- In the expression field paste the following:
# This query will show the current status of the `kube-state-metrics` job
up{job="kube-state-metrics"}
Accessing Prometheus on Google Cloud Shell
If you are running your cluster in Google Cloud Shell, you cannot use localhost for port forwarding. Instead, use the Cloud Shell Web Preview:
- Run the port-forward command as usual:
- In Google Cloud Shell, click the “Web Preview” button (top right) and select “Preview on port 9090”.
- Prometheus will open in a new browser tab.
Note: You can use any available port (e.g., 9090, 9091) in the port-forward command, just match it in the Web Preview.
Part 03 - Configuring Grafana¶
- In this part we will set
grafanato display the Cluster’s CPUs, Memory, and Requests. Grafanadashboards can be configured to display real-time metrics for CPU, memory, and requests.Prometheusstores these metrics andGrafanawill queryPrometheusto display them.
Step 01 - Add Prometheus as a Data Source in Grafana¶
- Log into
Grafanaat: http://localhost:3000, or use the Cloud Shell Web Preview. - Click on the hamburger icon on the left sidebar to open the Configuration menu.
- Click on Data Sources.
- Click Add data source and choose Prometheus.
- In the URL field, enter the Prometheus server URL:
http://prometheus-operated:9090. - Click Save & Test to confirm that the connection is working.
Step 02 - Create a Dashboard to Display Metrics¶
- Next step is to create a dashboard and panels to display the desired metrics.
-
To create a dashboard in
Grafanafor CPU, memory, and requests do the following: -
In
Grafana, open the left sidebar menu and select Dashboard. - Click Add visualization.
- Choose
Data Source(as we defined it previously). - In the panel editor, click on the
Codeoption (right side of the query builder). - Enter the below queries to visualize metric(s):
Note: To add new query click on the
+ Add query - Save the dashboard.
Demo: Automated stack and dashboard¶
There is a convenience demo under demo/01-stack that:
- Installs
kube-prometheus-stackand aGrafanarelease into themonitoringnamespace. - Provisions a Prometheus datasource in Grafana.
- Uploads a ready-made dashboard that shows CPU, memory, pod counts and HTTP request metrics.
Files:
- demo/01-stack/demo.sh
- demo/01-stack/grafana-dashboards/cluster-metrics-dashboard.json
Quick usage:
- CPU Usage
sum(rate(container_cpu_usage_seconds_total{namespace="default", container!="", container!="POD"}[5m])) by (pod, namespace)
- Memory Usage :
sum(container_memory_usage_bytes{namespace="default", container!="", container!="POD"}) by (pod, namespace)
- Request Count :
sum(rate(http_requests_total{job="kubelet", cluster="", namespace="default"}[5m])) by (pod, namespace)
Step 03 - Get Number of Pods in the Cluster¶
- To track the number of pods running in the cluster, add new panel with the following query:
# This query counts the number of pods running in all the namespaces
count(kube_pod_info{}) by (namespace)
- Add another query which will count the number of pods under the namespace
monitoring:
Tip
We have already defined query based upon namespaces before.... You can use the same approach to filter by other labels as well.
Step 04: Customize the Panel¶
- Change the visualization by changing the Graph Style
EFK Stack - Elasticsearch, Filebeat, Kibana¶
- The EFK stack is a popular Kubernetes-native logging solution combining Elasticsearch (storage), Filebeat (collection), and Kibana (visualization).
- This lab deploys a file-based processing architecture: Filebeat writes logs to a shared PVC instead of directly to Elasticsearch, decoupling collection from indexing.
- Full air-gapped / offline installation support via Harbor registry is included.
- The entire stack can be deployed via ArgoCD using the App of Apps pattern from Lab 18.
What will we learn?¶
- Deploy Elasticsearch, Filebeat, and Kibana on Kubernetes using Helm
- Implement a file-based log processing pipeline with a CronJob
- Use a shared PersistentVolumeClaim to buffer logs between collection and indexing
- Access Kibana via Nginx Ingress
- Query logs with KQL (Kibana Query Language)
- Deploy the EFK stack via ArgoCD App of Apps (from Lab 18)
- Perform air-gapped offline installation using Harbor as a local registry
What is the EFK Stack?¶
| Component | Role |
|---|---|
| Elasticsearch | Search and analytics engine - stores and indexes log data |
| Filebeat | Lightweight log shipper (DaemonSet) - collects container logs |
| Kibana | Web UI for searching, visualizing, and dashboarding log data |
Why File-Based Processing?¶
Traditional EFK sends logs directly from Filebeat to Elasticsearch. This lab uses an intermediate file approach:
| Aspect | Direct (traditional) | File-Based (this lab) |
|---|---|---|
| Reliability | Logs lost if ES is down | Logs persist on PVC even if ES is down |
| Debugging | No raw log access | Raw JSON files always available |
| Reprocessing | Not possible | Reprocess any time by rerunning the CronJob |
| Monitoring | Single pipeline | Clear separation: collection vs. indexing |
Architecture¶
graph TB
subgraph cluster["Kubernetes Cluster"]
subgraph nodes["All Nodes"]
fb["Filebeat DaemonSet\ncollects /var/log/containers/*.log"]
end
subgraph pods["Application Pods"]
lg["Log Generator\n(3 replicas)"]
other["Other application\npods"]
end
subgraph storage["Shared Storage"]
pvc["PersistentVolumeClaim\n5Gi - /filebeat-logs/"]
end
subgraph processing["Processing"]
cron["Log Processor\n(CronJob - every 2 min)"]
end
subgraph efk["efk namespace"]
es["Elasticsearch\n(StatefulSet)"]
kibana["Kibana\n(Deployment)"]
ing["Nginx Ingress\nkibana.local"]
end
end
user["User / Browser"] --> ing
lg -. logs .-> fb
other -. logs .-> fb
fb --> pvc
pvc --> cron
cron --> es
es --> kibana
ing --> kibana
Data Flow¶
sequenceDiagram
participant App as Application Pods
participant FB as Filebeat (DaemonSet)
participant PVC as Shared PVC
participant Proc as Log Processor (CronJob)
participant ES as Elasticsearch
participant Kib as Kibana
App->>FB: Write stdout/stderr logs
FB->>PVC: Write JSON log files (/filebeat-logs/)
Note over PVC: Files persisted on disk
Proc->>PVC: Read unprocessed files (every 2 min)
Proc->>ES: Bulk-send log entries via REST API
Proc->>PVC: Mark files as processed (keep originals)
Kib->>ES: Query logs via REST API
Directory Structure¶
33-EFK/
βββ README.md # This file
βββ .env # Configuration (image tags, Harbor settings)
βββ demo.sh # Online deployment script
βββ monitor.sh # Monitoring and testing script
βββ access-kibana.sh # Kibana access helper
βββ fix-kibana.sh # Dashboard re-import utility
βββ airgap.sh # Offline/air-gapped installation orchestrator
β
βββ argocd-apps/ # ArgoCD Application manifests (App of Apps)
β βββ elasticsearch.yaml # ArgoCD App: Elasticsearch Helm chart
β βββ filebeat.yaml # ArgoCD App: Filebeat Helm chart (wave 1)
β βββ kibana.yaml # ArgoCD App: Kibana Helm chart (wave 1)
β βββ log-generator.yaml # ArgoCD App: Log Generator Helm chart (wave 2)
β βββ log-processor.yaml # ArgoCD App: Log Processor Helm chart (wave 2)
β
βββ helm/
β βββ elasticsearch/ # Elasticsearch Helm chart
β βββ filebeat/ # Filebeat Helm chart (file output mode)
β βββ kibana/ # Kibana Helm chart (+ dashboard importer)
β β βββ dashboards/ # 8 pre-built NDJSON dashboard files
β βββ log-processor/ # Log Processor CronJob Helm chart
β βββ log-generator/ # Log Generator Helm chart
β
βββ scripts/
β βββ common.sh # Shared functions and color helpers
β βββ install-harbor.sh # Install Harbor registry on K8s
β βββ install-ingress.sh # Install Nginx Ingress Controller
β βββ retag-and-push-images.sh # Retag images for Harbor and push
β βββ upload-charts-to-harbor.sh # Push Helm charts to Harbor OCI
β βββ generate-harbor-values.sh # Generate registry override values
β βββ offline-install.sh # Install EFK from Harbor
β βββ verify-deployment.sh # Verify offline deployment
β
βββ artifacts/ # Offline artifacts (generated by airgap.sh)
βββ download-all.sh
βββ images/ # Container images as .tar files
βββ charts/ # Packaged Helm charts (.tgz)
βββ harbor/ # Harbor chart and images
Prerequisites¶
- Kubernetes cluster (v1.20+) with at least 8 GB RAM
kubectlconfigured to access your clusterHelm 3.xinstalled- (Optional) Nginx Ingress Controller for Kibana access
# Install kubectl (macOS)
brew install kubectl
# Install Helm
brew install helm
# Verify
kubectl version --client
helm version
Lab¶
Part 01 - Deploy the EFK Stack¶
01. Deploy All Components¶
The script will:
- Create the
efknamespace - Deploy Elasticsearch (StatefulSet)
- Deploy Kibana with Nginx Ingress
- Deploy Filebeat DaemonSet (writes logs to PVC files)
- Deploy Log Generator pods (3 replicas generating structured logs)
- Deploy Log Processor CronJob + run an initial Job immediately
- Wait for all pods to be ready
- Print Kibana access information
02. Access Kibana¶
Option A - Ingress (Recommended)¶
# Get the Ingress IP
INGRESS_IP=$(kubectl get ingress -n efk kibana \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
# Add to /etc/hosts if not already present
grep -q "kibana.local" /etc/hosts || \
echo "${INGRESS_IP:-192.168.49.2} kibana.local" | sudo tee -a /etc/hosts
open http://kibana.local
Option B - Port-Forward¶
Part 02 - Kibana Dashboards¶
Dashboards are automatically imported during deployment via the Kibana Helm chart’s dashboard importer init container.
Available Dashboards (8)¶
| Dashboard | Description |
|---|---|
| General Logs Dashboard | Overview of all logs by level, component, and time |
| Error Analysis Dashboard | Comprehensive error monitoring and analysis |
| Warning Analysis Dashboard | Track and analyze WARNING level logs |
| Component Activity Dashboard | Detailed per-component log breakdown |
| Performance Overview Dashboard | Key metrics, volume trends, and health indicators |
| HTTP Access Dashboard | HTTP request logs and access patterns |
| K8s Monitoring Dashboard | Kubernetes cluster monitoring |
| APM Dashboard | Application performance monitoring |
Access Steps¶
- Open Kibana at
http://kibana.local - Click Dashboard in the left sidebar
- Select any dashboard to view logs
Verify or Re-import Dashboards¶
# Check import job status
kubectl logs -n efk -l app=kibana,component=dashboard-importer
# Manually re-import by upgrading the chart
helm upgrade kibana ./helm/kibana -n efk
Part 03 - Log Pipeline¶
Log Generator¶
The log generator creates structured JSON logs with varying severity levels and simulated service names:
{
"timestamp": "2026-02-22T10:30:45Z",
"level": "ERROR",
"component": "PaymentService",
"message": "Transaction failed: timeout",
"request_id": "req-1740217845-12345",
"counter": 42
}
Components that generate logs: UserService, OrderService, PaymentService, AuthService, DatabaseService, CacheService
File-Based Pipeline Flow¶
graph LR
fb["Filebeat DaemonSet"] -->|"writes JSON"| pvc["Shared PVC\n/filebeat-logs/"]
pvc -->|"reads every 2min"| proc["Log Processor\n(CronJob)"]
proc -->|"bulk REST API"| es["Elasticsearch"]
proc -. "keeps original" .-> pvc
es --> kib["Kibana"]
Monitor the Pipeline¶
# Interactive monitor
./monitor.sh
# Quick summary
./monitor.sh summary
# End-to-end pipeline test
./monitor.sh test
# Full detailed report
./monitor.sh full
Manual Pipeline Checks¶
# Verify Filebeat is writing log files
kubectl exec -n efk -l app=filebeat -- ls -lh /filebeat-logs/
# Count documents in Elasticsearch
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/filebeat-*/_count
# View CronJob schedule
kubectl get cronjob -n efk
# Manually trigger the log processor
kubectl create job -n efk --from=cronjob/log-processor manual-$(date +%s)
kubectl logs -n efk job/manual-* --tail=30
Part 04 - Kibana Query Language (KQL)¶
# Show only ERROR logs
json.level: "ERROR"
# Show logs from a specific component
json.component: "PaymentService"
# Show ERROR or WARN logs
json.level: ("ERROR" OR "WARN")
# Show logs with a keyword in the message
json.message: *timeout*
# Combine multiple conditions
json.level: "ERROR" AND json.component: "PaymentService"
Useful Prometheus-style Elasticsearch Queries¶
# List all indices
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/_cat/indices?v
# Cluster health
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/_cluster/health?pretty
# Count documents
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/filebeat-*/_count?pretty
# Recent 5 log entries
kubectl exec -n efk elasticsearch-0 -- \
curl -s "http://localhost:9200/filebeat-*/_search?size=5&sort=@timestamp:desc&pretty"
Part 05 - Deploy via ArgoCD (App of Apps)¶
The EFK stack can be deployed via ArgoCD from Lab 18 using the App of Apps pattern. The argocd-apps/ directory contains individual ArgoCD Application manifests for each Helm chart.
Deploy via App of Apps (from Lab 18)¶
# From Lab 18 directory - deploy the root App of Apps
kubectl apply -f ../18-ArgoCD/apps/app-of-apps.yaml
ArgoCD will discover Labs/33-EFK/argocd-apps/ and deploy each component with proper sync waves:
- Wave 0 - Elasticsearch (deployed first)
- Wave 1 - Filebeat, Kibana (deployed after Elasticsearch is healthy)
- Wave 2 - Log Generator, Log Processor (deployed last)
Deploy EFK App of Apps Directly¶
# Apply only the EFK App of Apps (without the full Lab 18 setup)
kubectl apply -f argocd-apps/elasticsearch.yaml
kubectl apply -f argocd-apps/filebeat.yaml
kubectl apply -f argocd-apps/kibana.yaml
kubectl apply -f argocd-apps/log-generator.yaml
kubectl apply -f argocd-apps/log-processor.yaml
Monitor via ArgoCD¶
argocd app list | grep efk
argocd app get efk-elasticsearch
kubectl get applications -n argocd | grep efk
Part 06 - Air-Gapped / Offline Installation¶
This lab supports fully offline deployment using Harbor as a local Docker and Helm chart registry.
Air-Gapped Flow¶
graph LR
subgraph internet["Internet-Connected Machine"]
prep["1. ./airgap.sh prepare\nPull images + package charts"]
end
subgraph transfer["Transfer"]
tar["artifacts/ folder\n(images + charts + harbor)"]
end
subgraph airgap["Air-Gapped Cluster"]
install["2. ./airgap.sh install\nHarbor + push + EFK deploy"]
verify["3. ./airgap.sh verify\nValidate all components"]
end
prep --> tar --> install --> verify
Configuration (.env)¶
# Harbor settings
HARBOR_DOMAIN="harbor.local"
HARBOR_ADMIN_PASSWORD="Harbor12345"
HARBOR_PROJECT="efk"
# Image versions
ES_TAG="8.11.0"
FILEBEAT_TAG="8.11.0"
KIBANA_TAG="8.11.0"
Step 1 - Prepare Artifacts (requires internet)¶
Step 2 - Transfer to Air-Gapped Machine¶
Step 3 - Full Offline Install¶
Step 4 - Verify¶
All Air-Gap Commands¶
./airgap.sh prepare # Download artifacts (needs internet)
./airgap.sh install # Full install: Harbor + push + EFK
./airgap.sh harbor # Install Harbor registry only
./airgap.sh push # Push images and charts to Harbor
./airgap.sh efk # Install EFK from Harbor
./airgap.sh verify # Run verification tests
./airgap.sh status # Show deployment status
./airgap.sh cleanup # Remove EFK (keep Harbor)
./airgap.sh cleanup-all # Remove everything
Part 07 - Configuration¶
Elasticsearch¶
Edit helm/elasticsearch/values.yaml:
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "2Gi"
cpu: "1000m"
persistence:
size: 10Gi
Log Processor (CronJob Schedule)¶
Edit helm/log-processor/values.yaml:
# How often to process log files
schedule: "*/2 * * * *" # Every 2 minutes (default)
# schedule: "*/1 * * * *" # Every 1 minute
# schedule: "*/5 * * * *" # Every 5 minutes
processing:
keepOriginalFiles: true # Keep files in /filebeat-logs/ for inspection
createBackups: true # Also create copies in /filebeat-logs/processed/
Log Generator¶
Edit helm/log-generator/values.yaml:
Part 08 - Troubleshooting¶
Pods Not Starting¶
Filebeat Not Collecting Logs¶
# Check DaemonSet coverage
kubectl get daemonset -n efk filebeat
# Verify RBAC
kubectl get clusterrole filebeat
kubectl get clusterrolebinding filebeat
# Check log files are being written
kubectl exec -n efk -l app=filebeat -- ls -lh /filebeat-logs/
Log Processor Not Running¶
# Check CronJob status
kubectl get cronjob -n efk log-processor
# View recent job executions
kubectl get jobs -n efk -l app=log-processor --sort-by=.metadata.creationTimestamp
# View processor logs
kubectl logs -n efk -l app=log-processor --tail=100
# Manually trigger for testing
kubectl create job -n efk --from=cronjob/log-processor test-run-$(date +%s)
No Data in Kibana¶
# 1. Verify Filebeat is writing files
kubectl exec -n efk -l app=filebeat -- ls -lh /filebeat-logs/
# 2. Check Log Processor has run
kubectl get jobs -n efk -l app=log-processor
# 3. Confirm data in Elasticsearch
kubectl exec -n efk elasticsearch-0 -- \
curl -s http://localhost:9200/filebeat-*/_count
# 4. Check index pattern in Kibana matches: filebeat-*
# 5. Adjust the time range in Kibana (top right corner)
Note
The first log data appears in Kibana after the Log Processor CronJob runs (up to 2 minutes after deployment).
Kibana Dashboard Import Failed¶
kubectl logs -n efk -l app=kibana,component=dashboard-importer
# Re-import by upgrading the chart
helm upgrade kibana ./helm/kibana -n efk
Cleanup¶
# Full cleanup
./demo.sh cleanup
# Manual cleanup
helm uninstall elasticsearch filebeat kibana log-processor log-generator -n efk
kubectl delete namespace efk
Resources¶
GitOps & CI/CD
Helm Chart¶

- Welcome to the
HelmChart hands-on lab! In this tutorial, you’ll learn the essentials ofHelm(version 3), the package manager for Kubernetes. - You’ll build, package, install, and manage applications using
Helmcharts, gaining practical experience with real Kubernetes resources.
What will we learn?¶
- What
Helmis and why is it useful Helmchart structure and key files- Common
Helmcommands for managing releases - How to create, pack, install, upgrade, and rollback a
Helmchart - Go template language syntax for chart templates
- Built-in objects and named templates
- Advanced features: hooks, dependencies, conditionals, and testing
- Troubleshooting and best practices
Official Documentation & References¶
| Resource | Link |
|---|---|
| Helm Official Docs | helm.sh/docs |
| Chart Template Guide | helm.sh/docs/chart_template_guide |
| Built-in Objects | helm.sh/docs/chart_template_guide/builtin_objects |
| Values Files | helm.sh/docs/chart_template_guide/values_files |
| Template Functions & Pipelines | helm.sh/docs/chart_template_guide/functions_and_pipelines |
Flow Control (if, range, with) |
helm.sh/docs/chart_template_guide/control_structures |
Named Templates (_helpers.tpl) |
helm.sh/docs/chart_template_guide/named_templates |
| Chart Hooks | helm.sh/docs/topics/charts_hooks |
| Chart Dependencies | helm.sh/docs/helm/helm_dependency |
| Chart Tests | helm.sh/docs/topics/chart_tests |
| Chart Best Practices | helm.sh/docs/chart_best_practices |
| Go Template Language | pkg.go.dev/text/template |
| Sprig Template Functions | masterminds.github.io/sprig |
| Artifact Hub (Chart Repository) | artifacthub.io |
| Helm Cheat Sheet | helm.sh/docs/intro/cheatsheet |
Introduction¶
Helmis the package manager for Kubernetes.- It simplifies the deployment, management, and upgrade of applications on your Kubernetes cluster.
Helmhelps you manage Kubernetes applications by providing a way to define, install, and upgrade complex Kubernetes applications.-
When packing applications as
Helmcharts, you gain a standardized and reusable approach for deploying and managing your services. -
A
Helmchart consists of a few files that define the Kubernetes resources that will be created when the chart is installed. - These files include the:
Chart.yamlfile, which contains metadata about the chart, such as its name and version, and the chart’s dependencies and maintainers.values.yamlfile, which contains the configuration values for the chart.- The
templatesdirectory which contains the Kubernetes resource templates to be used to create the actual resources in the cluster.
Terminology¶
- A
Helmpackage is called a chart. - Charts are versioned, shareable packages that contain all the Kubernetes resources needed to run an application.
- A specific instance of a chart is called a release.
- Each release is a deployed version of a chart, with its own configuration, resources, and revision history.
- A collection of charts is stored in a
Helmrepository. Helmcharts can be hosted in public or private repositories for easy sharing and distribution.
Chart files and folders¶
| Filename/Folder | Description |
|---|---|
Chart.yaml |
Contains metadata about the chart, including its name, version, dependencies, and maintainers. |
Chart.lock |
Lock file listing exact versions of resolved dependencies. |
values.yaml |
Defines default configuration values for the chart. Users can override these values during installation. |
values.schema.json |
Optional JSON Schema for validating values.yaml structure. |
templates/ |
Directory containing Kubernetes manifest templates written in the Go template language. |
templates/NOTES.txt |
A plain text file containing usage notes displayed after installation. |
templates/_helpers.tpl |
A file containing reusable named templates (partials). |
templates/tests/ |
Directory containing test pod definitions for helm test. |
charts/ |
Directory containing dependencies (subcharts) of the chart. |
crds/ |
Directory containing Custom Resource Definitions (installed before templates). |
README.md |
Documentation for the chart, explaining how to use and configure it. |
Git HELM chart repo structure¶
While there are many ways to structure your Helm charts in a Git repository, here are the two most common patterns:
Pattern 1: One Repo per Chart¶
- Structure: The root of the repository contains the
Chart.yaml,values.yaml, andtemplates/folder. - Use Case: Best for microservices where each service has its own repository and its own chart.
- CI/CD: The chart is versioned and released alongside the application code.
my-app/
βββ Chart.yaml
βββ values.yaml
βββ templates/
βββ src/ # Application source code
βββ ...
Pattern 2: Dedicated Charts Repository (Monorepo)¶
- Structure: A central repository containing multiple charts in a
charts/directory. - Use Case: Best for managing infrastructure charts (e.g., redis, postgres) or when you want centralized management of all your organization’s charts.
- Hosting: Often used with GitHub Pages to host the chart repository index (
index.yaml) and packaged charts (.tgz).
my-charts-repo/
βββ charts/
β βββ redis/
β β βββ Chart.yaml
β β βββ ...
β βββ frontend/
β βββ Chart.yaml
β βββ ...
βββ docs/ # Contains generated index.yaml and .tgz files (GitHub Pages source)
βββ README.md
GitHub Pages as a Helm Repository¶
You can easily verify your Git repository into a Helm Chart Repository using GitHub Pages:
- Docs Folder: Create a
docsfolder in your repo. - Package: Run
helm package ./charts/mychart -d ./docs. - Index: Run
helm repo index ./docs --url https://<username>.github.io/<repo-name>/. - Publish: Enable GitHub Pages for the
docsfolder.
Users can then add your repo:
codewizard-helm-demo Helm Chart structure¶
- Chart.yaml # Defines chart metadata and values schema
- values.yaml # Default configuration values
- templates/ # Deployment templates using Go templating language
- _helpers.tpl # Named templates (partials) used across templates
- Namespace.yaml # Namespace manifest template
- ConfigMap.yaml # ConfigMap manifest template
- Deployment.yaml # Deployment manifest template
- Service.yaml # Service manifest template
- README.md # Documentation for your chart
Common Helm Commands¶
Below are the most common Helm commands you’ll use when working with Helm charts. Each command includes syntax, description, and detailed usage examples.
helm create - Create a new chart
Syntax: helm create <chart-name>
Description: Creates a new Helm chart with the specified name. This command generates a chart directory with a standard structure including default templates, values.yaml, and Chart.yaml.
- Creates a new chart directory with a standard structure
- Includes default templates, values.yaml, and Chart.yaml
- Provides a starting point that follows Helm best practices
-
You can customize the generated files to match your application needs
# Create a new chart named 'myapp' helm create myapp # View the generated structure tree myapp # Output shows: # myapp/ # βββ Chart.yaml # βββ values.yaml # βββ charts/ # βββ templates/ # βββ NOTES.txt # βββ _helpers.tpl # βββ deployment.yaml # βββ service.yaml # βββ ...
helm install - Install a chart
Syntax: helm install <release-name> <chart-path>
Description: Installs a Helm chart to your Kubernetes cluster, creating a new release with the specified name.
- Deploys a chart to your Kubernetes cluster
- Creates a new release with a unique name
- Can override values using
--setor-fflags -
Use
--dry-runto preview changes without applying them# Basic install helm install myrelease ./myapp # Install with custom values file helm install myrelease ./myapp -f custom-values.yaml # Install with inline value overrides helm install myrelease ./myapp --set replicaCount=3 # Install in a specific namespace helm install myrelease ./myapp --namespace production --create-namespace # Dry run to see what would be installed helm install myrelease ./myapp --dry-run --debug # Install from a packaged chart helm install myrelease myapp-1.0.0.tgz # Install with a generated name helm install myapp --generate-name # Wait for all resources to be ready before marking release as successful helm install myrelease ./myapp --wait --timeout 5m
helm upgrade - Upgrade a release
Syntax: helm upgrade <release-name> <chart-path>
Description: Upgrades an installed release with a new version of a chart or updated configuration values.
- Updates an existing release with new configurations or chart version
- Maintains revision history for rollback capability
- Can use
--installto install if release doesn’t exist -
Supports value overrides like install command
# Basic upgrade helm upgrade myrelease ./myapp # Upgrade with new values helm upgrade myrelease ./myapp -f production-values.yaml # Upgrade or install if not exists (atomic operation) helm upgrade myrelease ./myapp --install # Upgrade with specific values helm upgrade myrelease ./myapp --set image.tag=v2.0.0 # Force resource updates even if unchanged helm upgrade myrelease ./myapp --force # Reuse previous values and merge with new ones helm upgrade myrelease ./myapp --reuse-values --set newKey=newValue # Reset values to chart defaults helm upgrade myrelease ./myapp --reset-values # Wait for upgrade to complete helm upgrade myrelease ./myapp --wait --timeout 10m # Atomic upgrade - rollback on failure helm upgrade myrelease ./myapp --atomic --timeout 5m
helm uninstall - Remove a release
Syntax: helm uninstall <release-name>
Description: Uninstalls a release from the Kubernetes cluster, removing all associated resources.
- Deletes a release and all associated Kubernetes resources
- Removes the release from Helm’s history by default
- Use
--keep-historyto retain release history for potential restoration -
Respects hook deletion policies defined in templates
# Basic uninstall helm uninstall myrelease # Uninstall but keep history (allows rollback) helm uninstall myrelease --keep-history # Uninstall from specific namespace helm uninstall myrelease --namespace production # Uninstall with custom timeout helm uninstall myrelease --timeout 5m # Dry run - see what would be deleted helm uninstall myrelease --dry-run # Uninstall and wait for all resources to be deleted helm uninstall myrelease --wait
helm list - List releases
Syntax: helm list
Description: Lists all installed Helm releases in the current or specified namespace.
- Shows all releases in the current namespace
- Displays release name, namespace, revision, status, and chart info
- Supports filtering and output formatting options
-
Use
--all-namespacesto see releases across all namespaces# List all releases in current namespace helm list # List all releases across all namespaces helm list --all-namespaces # List releases in specific namespace helm list --namespace production # Show all releases including uninstalled (if kept history) helm list --all # Filter by status helm list --deployed helm list --failed helm list --pending # Show more details (longer output) helm list --all-namespaces -o wide # Output as JSON helm list -o json # Output as YAML helm list -o yaml # Filter releases by name pattern helm list --filter 'myapp.*' # Limit number of results helm list --max 10 # Sort by release date helm list --date
helm status - Show release status
Syntax: helm status <release-name>
Description: Shows the status of a deployed Helm release including resource information and deployment details.
- Displays detailed information about a deployed release
- Shows resource status, last deployment time, and revision number
- Includes NOTES.txt content if present
-
Useful for debugging and verifying deployments
# Show status of a release helm status myrelease # Show status from specific namespace helm status myrelease --namespace production # Show status at specific revision helm status myrelease --revision 2 # Output as JSON helm status myrelease -o json # Output as YAML helm status myrelease -o yaml # Show status without displaying NOTES helm status myrelease --show-desc
helm rollback - Rollback to previous revision
Syntax: helm rollback <release-name> [revision]
Description: Rollbacks a release to a previous revision number.
- Reverts a release to a previous revision
- Useful for quick recovery from failed upgrades
- Creates a new revision (rollback is tracked in history)
-
Can rollback to any previously deployed revision
# Rollback to previous revision helm rollback myrelease # Rollback to specific revision helm rollback myrelease 3 # Rollback with timeout helm rollback myrelease 2 --timeout 5m # Wait for rollback to complete helm rollback myrelease --wait # Force rollback even if resources haven't changed helm rollback myrelease --force # Dry run - see what would be rolled back helm rollback myrelease --dry-run # Recreate resources (delete and recreate) helm rollback myrelease --recreate-pods # Cleanup on fail helm rollback myrelease --cleanup-on-fail
helm get all - Get release information
Syntax: helm get all <release-name>
Description: Retrieves all information about a deployed release including templates, values, hooks, and notes.
- Retrieves all information about a release
- Shows manifest, values, hooks, and notes
- Useful for debugging and understanding what was deployed
-
Can retrieve information from specific revisions
# Get all info about a release helm get all myrelease # Get all info from specific revision helm get all myrelease --revision 2 # Get all info from specific namespace helm get all myrelease --namespace production # Output as template for reuse helm get all myrelease --template '{{.Release.Manifest}}'
helm get values - Get release values
Syntax: helm get values <release-name>
Description: Shows the user-supplied values for a release.
- Shows the values that were used for a specific release
- Displays only user-supplied values by default
- Use
--allto see all values including defaults -
Useful for understanding current configuration
# Get user-supplied values helm get values myrelease # Get all values (including defaults) helm get values myrelease --all # Get values from specific revision helm get values myrelease --revision 2 # Output as JSON helm get values myrelease -o json # Output as YAML helm get values myrelease -o yaml # Save values to file helm get values myrelease > current-values.yaml
helm show values - Show chart default values
Syntax: helm show values <chart-name>
Description: Shows the default values of a Helm chart before installation.
- Displays the default values.yaml from a chart
- Works with local charts, remote charts, or chart repositories
- Useful for understanding available configuration options
-
Shows values before installation
# Show default values of local chart helm show values ./myapp # Show values from packaged chart helm show values myapp-1.0.0.tgz # Show values from chart repository helm show values bitnami/nginx # Show values at specific version helm show values bitnami/nginx --version 15.0.0 # Save default values to file helm show values ./myapp > default-values.yaml
helm template - Render templates locally
Syntax: helm template <release-name> <chart-path>
Description: Renders chart templates locally without installing to the cluster.
- Renders chart templates locally without connecting to Kubernetes
- Outputs rendered YAML manifests to stdout
- Useful for debugging templates and previewing changes
-
Does not require cluster access
# Render templates to stdout helm template myrelease ./myapp # Render with custom values helm template myrelease ./myapp -f custom-values.yaml # Render with inline values helm template myrelease ./myapp --set replicaCount=3 # Render and save to file helm template myrelease ./myapp > rendered-manifests.yaml # Show only specific template helm template myrelease ./myapp --show-only templates/deployment.yaml # Debug mode - show more information helm template myrelease ./myapp --debug # Validate rendered output helm template myrelease ./myapp --validate # Include CRDs in output helm template myrelease ./myapp --include-crds # Render for specific Kubernetes version helm template myrelease ./myapp --kube-version 1.28.0
helm lint - Validate chart
Syntax: helm lint <chart-path>
Description: Runs a series of tests to verify that the chart is well-formed and follows best practices.
- Runs tests to verify chart is well-formed
- Checks Chart.yaml, values.yaml, and template syntax
- Identifies common errors and issues
-
Should be run before packaging or installing
# Lint a chart helm lint ./myapp # Lint with custom values helm lint ./myapp -f custom-values.yaml # Lint with inline values helm lint ./myapp --set replicaCount=3 # Strict linting (fail on warnings) helm lint ./myapp --strict # Lint with debug output helm lint ./myapp --debug # Lint multiple charts helm lint ./myapp ./anotherapp
helm history - Show release history
Syntax: helm history <release-name>
Description: Prints historical revisions for a given release.
- Displays revision history for a release
- Shows revision number, update time, status, and description
- Useful for understanding what changed and when
-
Helps identify which revision to rollback to
# Show release history helm history myrelease # Show history from specific namespace helm history myrelease --namespace production # Show more revisions (default is 256) helm history myrelease --max 100 # Output as JSON helm history myrelease -o json # Output as YAML helm history myrelease -o yaml # Output as table (default) helm history myrelease -o table
helm test - Run release tests
Syntax: helm test <release-name>
Description: Runs the tests defined in a chart for a release.
- Executes tests defined in chart’s templates/tests/ directory
- Tests are Kubernetes pods with the
helm.sh/hook: testannotation - Validates that a release is working correctly
-
Returns exit code based on test success/failure
# Run tests for a release helm test myrelease # Run tests from specific namespace helm test myrelease --namespace production # Run tests with timeout helm test myrelease --timeout 5m # Show test logs helm test myrelease --logs # Cleanup tests after run (default: false) helm test myrelease --cleanup # Filter which tests to run helm test myrelease --filter name=test-connection
helm dependency update - Update chart dependencies
Syntax: helm dependency update <chart-path>
Description: Updates the charts/ directory based on Chart.yaml dependencies.
- Downloads chart dependencies listed in Chart.yaml
- Stores dependencies in the charts/ subdirectory
- Creates or updates Chart.lock file
-
Required before packaging or installing charts with dependencies
helm repo add - Add chart repository
Syntax: helm repo add <name> <url>
Description: Adds a chart repository to your local Helm configuration.
- Adds a chart repository to your local Helm configuration
- Repositories are stored in ~/.config/helm/repositories.yaml
- Enables searching and installing charts from the repository
-
Can add both HTTP and OCI repositories
# Add a chart repository helm repo add bitnami https://charts.bitnami.com/bitnami # Add with authentication helm repo add myrepo https://charts.example.com --username user --password pass # Add and force update if exists helm repo add bitnami https://charts.bitnami.com/bitnami --force-update # Add repository with custom certificate helm repo add myrepo https://charts.example.com --ca-file ca.crt # Add repository skipping TLS verification (not recommended) helm repo add myrepo https://charts.example.com --insecure-skip-tls-verify # List all repositories helm repo list
helm repo update - Update repository information
Syntax: helm repo update
Description: Updates information of available charts from chart repositories.
- Updates the local cache of charts from all added repositories
- Fetches the latest available charts and versions
- Should be run periodically to see new chart releases
-
Similar to
apt updateoryum update
helm search repo - Search repositories
Syntax: helm search repo <keyword>
Description: Searches repositories for charts matching a keyword.
- Searches added repositories for charts matching keyword
- Shows chart name, version, app version, and description
- Supports regex patterns for advanced searching
-
Only searches locally added repositories
# Search for charts helm search repo nginx # Search with version information helm search repo nginx --versions # Search with regex helm search repo 'nginx.*' # Show development versions (pre-release, etc.) helm search repo nginx --devel # Search with specific version constraint helm search repo nginx --version "~15.0" # Output as JSON helm search repo nginx -o json # Output as YAML helm search repo nginx -o yaml # Search all repositories helm search repo --max-col-width 0
Advanced Concepts¶
Built-in Objects¶
Helm templates have access to several built-in objects. These are the most commonly used:
| Object | Description |
|---|---|
.Release.Name |
The name of the release |
.Release.Namespace |
The namespace the release is installed into |
.Release.Revision |
The revision number of this release (starts at 1) |
.Release.IsInstall |
true if the current operation is an install |
.Release.IsUpgrade |
true if the current operation is an upgrade |
.Release.Service |
The service rendering the template (always Helm) |
.Values |
Values passed to the template from values.yaml and user overrides |
.Chart.Name |
The name of the chart from Chart.yaml |
.Chart.Version |
The version of the chart |
.Chart.AppVersion |
The app version from Chart.yaml |
.Template.Name |
The namespaced path to the current template file |
.Template.BasePath |
The namespaced path to the templates directory |
.Files |
Access to non-template files in the chart |
.Capabilities |
Information about the Kubernetes cluster capabilities |
Docs: Built-in Objects
Go Template Syntax¶
Helm uses the Go template language with additional Sprig functions. Here’s a quick reference:
Template Delimiters¶
| Delimiter syntax | Meaning |
|---|---|
{{ ... }} |
Standard output expression - evaluates and prints the result. |
{{- ... }} |
Trim whitespace/newline to the left of the action (left-trim). Useful at the start of a template line. |
{{ ... -}} |
Trim whitespace/newline to the right of the action (right-trim). Useful at the end of a template line. |
{{- ... -}} |
Trim whitespace/newline on both sides of the action. |
Delimiters
- Whitespace trimming controls whether newlines and spaces immediately before or after template actions appear in the rendered YAML - this is important to produce valid, tidy manifests.
- Prefer
{{-at the start of a block and-}}at the end of a block when you want to avoid blank lines in rendered output.
Example (shows differences in rendered output):
# Template A (no trimming)
prefix:
{{ "val" }}
suffix:
# Template B (left-trim)
prefix:
{{- "val" }}
suffix:
# Template C (right-trim)
prefix:
{{ "val" -}}
suffix:
# Template D (both sides trimmed)
prefix:
{{- "val" -}}
suffix:
When rendered, trimming removes the surrounding blank lines and keeps YAML indentation correct; use helm template during development to verify the output.
Variables¶
Pipelines and Functions¶
Template functions can be chained using the pipe | operator:
# C`yaml
# Convert to uppercase
name: {{ .Values.name | upper }}
# Default value if empty
image: {{ .Values.image | default "nginx:latest" }}
# Quoting a value
version: {{ .Values.version | quote }}
# Trim and truncate
name: {{ .Values.name | trunc 63 | trimSuffix "-" }}
# Indentation (critical for YAML)
metadata:
labels:
{{- include "myapp.labels" . | nindent 4 }}
Common Sprig Functions
-
Sprig is a library that provides over 70 useful template functions for Goβs template language.
- String:
upper,lower,title,trim,quote,trunc,trimSuffix - Defaults:
default - Indentation:
nindent,indent - Encoding/Conversion:
toYaml,toJson,b64enc,b64dec - Date/Time:
now,htmlDate - Crypto:
sha256sum> Common Sprig functions:upper,lower,title,trim,quote,default,trunc,trimSuffix,nindent,indent,toYaml,toJson,b64enc,b64dec,sha256sum,now,htmlDate`
- String:
Docs: Functions and Pipelines
Flow Control¶
Conditionals (if / else)¶
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "myapp.fullname" . }}
spec:
rules:
- host: {{ .Values.ingress.host }}
{{- end }}
`
```yaml
# if / else if / else
{{- if eq .Values.env "production" }}
replicas: 5
{{- else if eq .Values.env "staging" }}
replicas: 2
{{- else }}
replicas: 1
{{- end }}
Comparison operators¶
| Operator | Description |
|---|---|
eq |
Equal |
ne |
Not equal |
lt |
Less than |
gt |
Greater than |
le |
Less than or equal |
ge |
Greater than or equal |
and |
Logical AND |
or |
Logical OR |
not |
Logical NOT |
Looping (range)¶
# Iterating over a list
env:
{{- range .Values.env }}
- name: {{ .name }}
value: {{ .value | quote }}
{{- end }}
# Iterating over a map/dict
labels:
{{- range $key, $value := .Values.labels }}
{{ $key }}: {{ $value | quote }}
{{- end }}
Scoping (with)¶
# `with` changes the scope of `.` inside the block
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 2 }}
{{- end }}
Docs: Flow Control
Named Templates (_helpers.tpl)¶
- Files prefixed with
_(underscore) in thetemplates/directory are not rendered as Kubernetes manifests. - They are used to define reusable named templates (also called partials or sub-templates).
- Named templates are defined with
defineand invoked withinclude(preferred) ortemplate.
# _helpers.tpl - defining a named template
{{- define "myapp.labels" -}}
app.kubernetes.io/name: {{ include "myapp.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
# Using the named template in a manifest
metadata:
labels: { { - include "myapp.labels" . | nindent 4 } }
include vs template
- Always prefer
includeovertemplate. -Theincludefunction allows you to pipe the output (e.g.,| nindent 4), whiletemplatedoes not support pipelines.
Docs: Named Templates
Values Override Precedence¶
When installing or upgrading a release, values can be supplied from multiple sources. The override precedence (last wins) is:
values.yamlin the chart (defaults)- Parent chart’s
values.yaml(for subcharts) - Values file passed with
-f/--values - Individual values set with
--setor--set-string
# Override with a custom values file
helm install my-release ./mychart -f custom-values.yaml
# Override with --set (highest precedence)
helm install my-release ./mychart --set replicaCount=3
# Multiple overrides combined
helm install my-release ./mychart \
-f production-values.yaml \
--set image.tag="v2.0.0"
# Override with --set-string (forces string type)
helm install my-release ./mychart --set-string image.tag="1234"
# Override with --set-file (read value from a file)
helm install my-release ./mychart --set-file config=./my-config.txt
Docs: Values Files
Chart Dependencies (Subcharts)¶
Charts can depend on other charts. Dependencies are declared in Chart.yaml:
# Chart.yaml
apiVersion: v2
name: my-app
version: 1.0.0
dependencies:
- name: postgresql
version: "12.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: postgresql.enabled
- name: redis
version: "17.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
# Download and update dependencies
helm dependency update ./my-app
# The dependencies are stored in the charts/ directory
ls ./my-app/charts/
Docs: Chart Dependencies
Helm Hooks¶
Hooks allow you to run resources at specific points in a release lifecycle. - They are standard Kubernetes resources (Jobs, Pods, ConfigMaps, etc.) with special annotations that tell Helm when to execute them.
Hook Types¶
| Hook | Description |
|---|---|
pre-install |
Runs before any resources are installed |
post-install |
Runs after all resources are installed |
pre-upgrade |
Runs before any resources are upgraded |
post-upgrade |
Runs after all resources are upgraded |
pre-delete |
Runs before any resources are deleted |
post-delete |
Runs after all resources are deleted |
pre-rollback |
Runs before a rollback |
post-rollback |
Runs after a rollback |
test |
Runs when helm test is called |
Hook Annotations¶
Hooks are controlled by three key annotations:
| Annotation | Description |
|---|---|
helm.sh/hook |
Defines when the hook runs (required). Can specify multiple hooks: "pre-install,pre-upgrade" |
helm.sh/hook-weight |
Defines execution order (default: 0). Lower weights execute first. Can be negative. |
helm.sh/hook-delete-policy |
Defines when to delete the hook resource. Values: before-hook-creation, hook-succeeded, hook-failed |
Hook Deletion Policies¶
| Policy | Description |
|---|---|
before-hook-creation |
Delete previous hook resource before a new one is launched (default) |
hook-succeeded |
Delete the hook resource after it successfully completes |
hook-failed |
Delete the hook resource if it fails |
You can specify multiple policies: "hook-succeeded,hook-failed"
Hook Execution Order¶
Hooks execute in the following order:
- Sorted by weight (ascending): hooks with lower weights run first
- Sorted by kind (alphabetical): if weights are equal
- Sorted by name (alphabetical): if both weight and kind are equal
Practical Examples¶
(pre-install/pre-upgrade)¶
Example 1: Database Migration (pre-install/pre-upgrade)
- This hook runs a database migration job before installing or upgrading the main application resources.
- It ensures that the database schema is up-to-date before the application starts.
- The
migrate.shscript would contain the logic to perform the database migration, and it would use environment variables to connect to the database.# templates/hooks/db-migrate.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-db-migrate annotations: "helm.sh/hook": pre-install,pre-upgrade "helm.sh/hook-weight": "0" "helm.sh/hook-delete-policy": hook-succeeded,hook-failed spec: template: metadata: name: {{ include "myapp.fullname" . }}-db-migrate spec: containers: - name: migrate image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" command: ["./migrate.sh"] env: - name: DB_HOST value: {{ .Values.database.host }} - name: DB_NAME value: {{ .Values.database.name }} restartPolicy: Never backoffLimit: 3
Example 2: Schema Initialization (pre-install only)
- This hook runs a database initialization job only during the initial installation of the chart.
- It creates the database schema if it doesn’t already exist.
- It uses a lower weight to ensure it runs before the migration hook.
- The
psqlcommand is used to create the database, and it connects using environment variables for the database host and credentials. - This hook will not run during upgrades, ensuring that it only initializes the database on the first install.
# templates/hooks/db-init.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-db-init annotations: "helm.sh/hook": pre-install "helm.sh/hook-weight": "-5" # Runs before migration (weight: 0) "helm.sh/hook-delete-policy": hook-succeeded spec: template: spec: containers: - name: init-db image: postgres:14 command: - sh - -c - | psql -h $DB_HOST -U $DB_USER -c "CREATE DATABASE IF NOT EXISTS {{ .Values.database.name }};" env: - name: DB_HOST value: {{ .Values.database.host }} - name: DB_USER valueFrom: secretKeyRef: name: db-secret key: username - name: PGPASSWORD valueFrom: secretKeyRef: name: db-secret key: password restartPolicy: Never
Example 3: Service Readiness Check (post-install)
- This hook runs a job after the main application resources are installed to check if the service is ready.
- It uses a simple
curlcommand to check the health endpoint of the service, retrying until it gets a successful response. - This ensures that the application is fully operational before the release is considered successful.
- The hook will be deleted after it succeeds, preventing it from running again unnecessarily.
- This is particularly useful for applications that require some time to become ready after deployment, such as those that perform initialization tasks or have complex startup processes.
- By using a post-install hook, you can provide immediate feedback on the success of the deployment and ensure that users are aware of any issues with service readiness right after installation.
# templates/hooks/smoke-test.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-smoke-test annotations: "helm.sh/hook": post-install,post-upgrade "helm.sh/hook-weight": "5" "helm.sh/hook-delete-policy": hook-succeeded spec: template: spec: containers: - name: smoke-test image: curlimages/curl:latest command: - sh - -c - | echo "Waiting for service to be ready..." until curl -f http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health; do echo "Service not ready yet, retrying in 5 seconds..." sleep 5 done echo "Service is ready!" restartPolicy: Never backoffLimit: 10
Example 4: Backup Before Upgrade (pre-upgrade)
- This hook creates a backup of the database before upgrading the application.
- It uses the
pg_dumpcommand to create a SQL backup file with a timestamp. - The backup is stored in a persistent volume claim to ensure it’s retained even if the job is deleted.
- The hook runs with a weight of
-10to ensure it executes before other upgrade hooks like migrations. - This is a critical safety measure to ensure you can restore your data if an upgrade fails or causes data corruption.
# templates/hooks/backup.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-backup-{{ now | date "20060102-150405" }} annotations: "helm.sh/hook": pre-upgrade "helm.sh/hook-weight": "-10" "helm.sh/hook-delete-policy": hook-succeeded spec: template: spec: containers: - name: backup image: "{{ .Values.backup.image }}" command: - sh - -c - | echo "Creating backup before upgrade..." pg_dump -h $DB_HOST -U $DB_USER $DB_NAME > /backup/backup-$(date +%Y%m%d-%H%M%S).sql echo "Backup completed successfully" env: - name: DB_HOST value: {{ .Values.database.host }} - name: DB_USER value: {{ .Values.database.user }} - name: DB_NAME value: {{ .Values.database.name }} volumeMounts: - name: backup-storage mountPath: /backup volumes: - name: backup-storage persistentVolumeClaim: claimName: backup-pvc restartPolicy: Never
Example 5: Notification Hook (post-install/post-upgrade)
- This hook sends a notification to Slack after the application is successfully installed or upgraded.
- It uses the Helm built-in
.Release.IsInstallvariable to determine whether this is a new installation or an upgrade. - The hook runs with a weight of
10to ensure it executes after other post-install/upgrade hooks like smoke tests. - Notifications are sent regardless of hook success or failure (deletion policy:
hook-succeeded,hook-failed). - This is useful for keeping your team informed about deployments in production environments.
# templates/hooks/notify.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-notify annotations: "helm.sh/hook": post-install,post-upgrade "helm.sh/hook-weight": "10" # Runs after smoke test "helm.sh/hook-delete-policy": hook-succeeded,hook-failed spec: template: spec: containers: - name: notify image: curlimages/curl:latest command: - sh - -c - | if [ "{{ .Release.IsInstall }}" = "true" ]; then ACTION="installed" else ACTION="upgraded" fi curl -X POST {{ .Values.slack.webhookUrl }} \ -H 'Content-Type: application/json' \ -d "{\"text\":\"Application {{ .Release.Name }} has been $ACTION to version {{ .Chart.Version }} in namespace {{ .Release.Namespace }}\"}" restartPolicy: Never
Example 6: Cleanup Hook (pre-delete)
- This hook performs cleanup operations before the main application resources are deleted.
- It uses
kubectlto delete specific resources (in this case, ConfigMaps) that match certain labels. - The hook requires a ServiceAccount with appropriate RBAC permissions to delete resources in the namespace.
- This is useful for cleaning up dynamically created resources that might not be tracked by Helm directly.
- The hook will be deleted after it succeeds, preventing orphaned cleanup jobs from accumulating.
# templates/hooks/cleanup.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-cleanup annotations: "helm.sh/hook": pre-delete "helm.sh/hook-weight": "0" "helm.sh/hook-delete-policy": hook-succeeded spec: template: spec: containers: - name: cleanup image: bitnami/kubectl:latest command: - sh - -c - | echo "Cleaning up resources..." kubectl delete configmap -n {{ .Release.Namespace }} -l app={{ include "myapp.name" . }},release={{ .Release.Name }} echo "Cleanup completed" serviceAccountName: {{ include "myapp.fullname" . }}-cleanup restartPolicy: Never
Example 7: Secret Creation Hook (pre-install)
- This hook generates a random password and creates a Kubernetes secret before the main application resources are installed.
- It uses the
bitnami/kubectlimage to runkubectlcommands directly from the job. - The generated password is stored in a secret named
<release-name>-db-secretin the same namespace as the release. - The hook is set to run only during installation, ensuring that a new secret is created each time a new release is installed.
- The
before-hook-creationdeletion policy ensures that if the hook runs multiple times (e.g., due to retries), the previous secret will be deleted before a new one is created, preventing orphaned secrets from accumulating. - This hook is useful for scenarios where you need to generate dynamic configuration or credentials that must be available before the main application resources are created.
# templates/hooks/create-secret.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "myapp.fullname" . }}-create-secret annotations: "helm.sh/hook": pre-install "helm.sh/hook-weight": "-15" # Runs very early "helm.sh/hook-delete-policy": before-hook-creation spec: template: spec: containers: - name: create-secret image: bitnami/kubectl:latest command: - sh - -c - | # Generate random password PASSWORD=$(openssl rand -base64 32) # Create Kubernetes secret kubectl create secret generic {{ include "myapp.fullname" . }}-db-secret \ --from-literal=password=$PASSWORD \ --namespace={{ .Release.Namespace }} \ --dry-run=client -o yaml | kubectl apply -f - echo "Secret created successfully" serviceAccountName: {{ include "myapp.fullname" . }}-admin restartPolicy: Never
Hook Best Practices¶
- Use appropriate weights: Order hooks logically (e.g., backup before migration)
- Set deletion policies: Clean up hook resources to avoid clutter
- Add timeouts: Use
activeDeadlineSecondsin Job specs to prevent hanging - Use backoff limits: Set
backoffLimitto control retry attempts - Handle idempotency: Hooks should be safe to run multiple times
- Consider rollback: Avoid destructive operations in pre-delete hooks
- Test hooks: Run
helm install --dry-run --debugto preview hook behavior - Use ServiceAccounts: Grant appropriate RBAC permissions for hooks that interact with the cluster
Debugging Hooks¶
# View hook resources
kubectl get jobs,pods -n <namespace> -l heritage=Helm
# Check hook logs
kubectl logs job/<hook-job-name> -n <namespace>
# View hook status during install
helm install myapp ./chart --wait --debug
# Manually clean up failed hooks
kubectl delete job <hook-job-name> -n <namespace>
Docs: Chart Hooks
Helm Tests¶
- Helm tests live in
templates/tests/and are pod definitions with the"helm.sh/hook": testannotation. - They are executed with
helm test <release-name>.
# templates/tests/test-connection.yaml
apiVersion: v1
kind: Pod
metadata:
name: {{ include "myapp.fullname" . }}-test-connection
annotations:
"helm.sh/hook": test
spec:
containers:
- name: wget
image: busybox
command: ['wget']
args: ['{{ include "myapp.fullname" . }}:{{ .Values.service.port }}']
restartPolicy: Never
Docs: Chart Tests
NOTES.txt - Post-Install Messages¶
You can create a templates/NOTES.txt file to display useful information after a chart is installed:
Thank you for installing {{ .Chart.Name }}!
Your release is named: {{ .Release.Name }}
To access the application, run:
kubectl port-forward svc/{{ include "myapp.fullname" . }} 8080:{{ .Values.service.port }}
Then open http://localhost:8080 in your browser.
Lab¶
Step 01 - Installing Helm¶
-
Before you can use the
codewizard-helm-demochart, you’ll need to installHelmon your local machine. -
Helminstall methods by OS:
Verify Installation¶
- To confirm that
Helmis installed correctly, run:
helm version
## Expected output
version.BuildInfo{Version:"xx", GitCommit:"xx", GitTreeState:"clean", GoVersion:"xx"}
Step 02 - Creating our Helm chart¶
- Creating our custom
codewizard-helm-demoHelmchart - The custom
codewizard-helm-demoHelmchart is build upon the following K8S resources:- ConfigMap
- Deployment
- Service
- As mentioned above, we will also have the following
Helmresources:- Chart.yaml
- values.yaml
- templates/_helpers.tpl
Create a New Chart¶
- First, we need to create a
Helmchart using thehelm createcommand. - This command will generate the necessary file structure for your new chart.
What is the result of this command?
- Examine the chart structure!
- Try to explain to yourself which files are in the folder.
- See the above reference to the structure Chart files and folders
Navigate to the Chart Directory¶
Write the chart content¶
- Copy the content of the chart folder (in this lab) to the chart directory (overwriting the files).
Step 03 - Pack the chart¶
- After we have created or customized our chart, we need to pack it as
.tgzfile, which can then be shared or installed.
helm package¶
Helm Package
helm package packages a chart into a versioned chart archive file.
If a path is given, this will “look” at that path for a chart which must contain a Chart.yaml file and then pack that directory.
- This command will create a file called
codewizard-helm-demo-<version>.tgzinside your current directory.
Step 04 - Validate the chart content¶
helm template¶
Helmallows you to generate the Kubernetes manifests based on the templates and values files without actually installing the chart.- This is useful to preview what the generated resources will look like:
helm template codewizard-helm-demo
## This will output the rendered Kubernetes manifests to your terminal
helm lint¶
- You can also lint the chart to check for well-formedness and best practices:
helm lint codewizard-helm-demo
## Expected output:
## ==> Linting codewizard-helm-demo
## [INFO] Chart.yaml: icon is recommended
## 1 chart(s) linted, 0 chart(s) failed
Step 05 - Install the chart¶
- Install the
codewizard-helm-demochart into Kubernetes cluster
The helm install command¶
- This command installs a chart archive.
- The install argument must be a chart reference, a path to a packed chart, a path to an unpacked chart directory or a URL.
- To override values in a chart, use:
--values- pass in a file--set- pass configuration from the command line- Use
--dry-runto simulate an install without actually deploying:
# Dry run - preview what will be installed without deploying
helm install codewizard-helm-demo codewizard-helm-demo-0.1.0.tgz --dry-run
Step 06 - Verify the installation¶
- Examine newly created
Helmchart release, and all cluster created resources:
# List the installed helms
helm ls
# Show detailed status of the release
helm status codewizard-helm-demo
# Get the rendered manifests of the release
helm get manifest codewizard-helm-demo
# Get the values used by the release
helm get values codewizard-helm-demo
# Check the resources
kubectl get all -n codewizard
Step 07 - Test the service¶
- Perform an
HTTP GETrequest, send it to the newly created cluster service. - Confirm that the response contains the
CodeWizard Helm Demomessage passed from thevalues.yamlfile.
kubectl run busybox \
--image=busybox \
--rm \
-it \
--restart=Never \
-- /bin/sh -c "wget -qO- http://codewizard-helm-demo.codewizard.svc.cluster.local"
### Output:
CodeWizard Helm Demo
- You can also test the release name and revision endpoints defined in the ConfigMap:
# Get the release name
kubectl run busybox --image=busybox --rm -it --restart=Never \
-- /bin/sh -c "wget -qO- http://codewizard-helm-demo.codewizard.svc.cluster.local/release/name"
# Get the release revision
kubectl run busybox --image=busybox --rm -it --restart=Never \
-- /bin/sh -c "wget -qO- http://codewizard-helm-demo.codewizard.svc.cluster.local/release/revision"
# upgrade and pass a different message than the one from the default values
# Use the --set to pass the desired value
helm upgrade \
codewizard-helm-demo \
codewizard-helm-demo-0.1.0.tgz \
--set nginx.conf.message="Helm Rocks"
Step 09 - Check the upgrade¶
- Perform another
HTTP GETrequest. - Confirm that the response now has the updated message
Helm Rocks:
kubectl run busybox \
--image=busybox \
--rm \
-it \
--restart=Never \
-- /bin/sh -c "wget -qO- http://codewizard-helm-demo.codewizard.svc.cluster.local"
### Output:
Helm Rocks
- Also check the revision number - it should now be
2:
kubectl run busybox --image=busybox --rm -it --restart=Never \
-- /bin/sh -c "wget -qO- http://codewizard-helm-demo.codewizard.svc.cluster.local/release/revision"
### Output:
2
helm history¶
helm historyprints historical revisions for a given release.- A default maximum of 256 revisions will be returned.
helm history codewizard-helm-demo
### Sample output
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 ... superseded codewizard-helm-demo-0.1.0 1.19.7 Install complete
2 ... deployed codewizard-helm-demo-0.1.0 1.19.7 Upgrade complete
Step 11 - Rollback¶
helm rollback¶
- Rollback the
codewizard-helm-demorelease to previous version:
- Check again to verify that you get the original message!
Exercises¶
The following exercises will test your understanding of Helm concepts.
Try to solve each exercise on your own before revealing the solution.
01. Explore a Public Chart Repository¶
Add the Bitnami chart repository and search for an nginx chart.
Scenario:¶
β¦ You need to find and inspect a publicly available Helm chart before installing it. β¦ Chart repositories are the standard way to discover and share Helm charts.
Hint: Use helm repo add, helm repo update, and helm search repo.
Solution
02. Install with Custom Values File¶
Create a custom values.yaml file that changes the replicaCount to 3 and the message to "Hello from custom values", then install the chart using this file.
Scenario:¶
β¦ In production environments, you rarely use default values. β¦ Custom values files let you manage environment-specific configurations (dev, staging, prod).
Hint: Create a YAML file and use helm install -f <file>.
Solution
# Create a custom values file
cat <<EOF > custom-values.yaml
replicaCount: 3
nginx:
conf:
message: "Hello from custom values"
EOF
# Install with the custom values file
helm install custom-demo codewizard-helm-demo-0.1.0.tgz -f custom-values.yaml
# Verify the replica count
kubectl get deployment -n codewizard
# Verify the message
kubectl run busybox --image=busybox --rm -it --restart=Never \
-- /bin/sh -c "wget -qO- http://custom-demo.codewizard.svc.cluster.local"
# Cleanup
helm uninstall custom-demo
rm custom-values.yaml
03. Debug a Failing Template¶
Given the following broken template snippet, identify and fix the error.
Save this as templates/broken.yaml in the chart, then use helm template to find the issue:
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Values.brokenName }
data:
key: {{ .Values.missingValue | default "fallback" }}
Scenario:¶
β¦ Template syntax errors are common during chart development.
β¦ helm template and helm lint are your best debugging tools.
Hint: Use helm template and helm lint to identify the error. Count the curly braces.
Solution
The error is a missing closing brace on line 4: `{{ .Values.brokenName }` should be `{{ .Values.brokenName }}`.# Lint the chart to find errors
helm lint codewizard-helm-demo
# Try to render the template - this will show the error
helm template codewizard-helm-demo
# Fix: the correct line should be:
# name: {{ .Values.brokenName }}
# (two closing braces instead of one)
# Don't forget to remove the broken test file
rm codewizard-helm-demo/templates/broken.yaml
04. Use --set to Override Multiple Values¶
Upgrade the codewizard-helm-demo release to use 3 replicas, image tag 1.21.0, and the message "Multi-set Override" - all in a single command.
Scenario:¶
β¦ Quick overrides using --set are common for CI/CD pipelines and ad-hoc changes.
β¦ You need to understand the dot-notation for nested values.
Hint: Chain multiple --set flags or use comma-separated notation.
Solution
# Method 1: multiple --set flags
helm upgrade codewizard-helm-demo codewizard-helm-demo-0.1.0.tgz \
--set replicaCount=3 \
--set image.tag="1.21.0" \
--set nginx.conf.message="Multi-set Override"
# Method 2: comma-separated (equivalent)
helm upgrade codewizard-helm-demo codewizard-helm-demo-0.1.0.tgz \
--set replicaCount=3,image.tag=1.21.0,nginx.conf.message="Multi-set Override"
# Verify the values
helm get values codewizard-helm-demo
# Verify replicas and image
kubectl get deployment -n codewizard -o wide
05. Add a Conditional Resource¶
Modify the codewizard-helm-demo chart to add an optional Ingress resource that is only created when ingress.enabled is set to true in values.yaml.
Scenario:¶
β¦ Not all environments need an Ingress (e.g., local development vs. production).
β¦ Conditional rendering with if blocks is a fundamental Helm templating pattern.
Hint: Use {{- if .Values.ingress.enabled }} … {{- end }}. Add ingress.enabled: false to values.yaml.
Solution
# 1. Add ingress values to values.yaml
cat <<EOF >> codewizard-helm-demo/values.yaml
ingress:
enabled: false
host: demo.example.com
EOF
# 2. Create templates/Ingress.yaml
# Save this as codewizard-helm-demo/templates/Ingress.yaml:
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "webserver.fullname" . }}
namespace: codewizard
labels:
{{- include "webserver.labels" . | nindent 4 }}
spec:
rules:
- host: {{ .Values.ingress.host }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ include "webserver.fullname" . }}
port:
number: {{ .Values.service.port }}
{{- end }}
# 3. Verify: template without ingress (should NOT include Ingress resource)
helm template codewizard-helm-demo | grep -A5 "kind: Ingress"
# 4. Verify: template WITH ingress enabled (should include Ingress resource)
helm template codewizard-helm-demo --set ingress.enabled=true | grep -A20 "kind: Ingress"
# 5. Cleanup - remove the ingress template
rm codewizard-helm-demo/templates/Ingress.yaml
06. Add a Named Template¶
Create a new named template in _helpers.tpl called webserver.annotations that generates a set of annotations including the chart version and a custom team annotation from values. Then use it in the Deployment.
Scenario:¶
β¦ Named templates reduce duplication across manifests. β¦ Annotations are commonly used for metadata, monitoring, and CI/CD integration.
Hint: Use {{- define "webserver.annotations" -}} to define and {{ include "webserver.annotations" . | nindent N }} to use it.
Solution
# 1. Add to _helpers.tpl:
{{/*
Common annotations
*/}}
{{- define "webserver.annotations" -}}
app.kubernetes.io/chart: {{ include "webserver.chart" . }}
app.kubernetes.io/team: {{ .Values.team | default "platform" }}
{{- end }}
07. Use range to Generate Multiple Environment Variables¶
Modify the Deployment template to inject a list of environment variables from values.yaml using the range function.
Scenario:¶
β¦ Real-world deployments often require multiple environment variables. β¦ Hardcoding them in templates is not maintainable - values-driven configuration is preferred.
Hint: Add an env list to values.yaml and use {{- range .Values.env }} in the container spec.
Solution
# 1. Add to values.yaml:
env:
- name: APP_ENV
value: "production"
- name: LOG_LEVEL
value: "info"
- name: APP_VERSION
value: "1.0.0"
08. Create a Helm Test¶
Add a Helm test to the codewizard-helm-demo chart that verifies the service is reachable and returns the expected message.
Scenario:¶
β¦ Helm tests allow you to validate that a release is working correctly after deployment.
β¦ Tests are pod definitions with the "helm.sh/hook": test annotation.
Hint: Create a file in templates/tests/ with a busybox pod that runs wget against the service.
Solution
# 2. Create templates/tests/test-connection.yaml:
apiVersion: v1
kind: Pod
metadata:
name: {{ include "webserver.fullname" . }}-test-connection
namespace: codewizard
labels:
{{- include "webserver.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": test
spec:
containers:
- name: wget
image: busybox
command: ['sh', '-c']
args:
- |
RESPONSE=$(wget -qO- http://{{ include "webserver.fullname" . }}.codewizard.svc.cluster.local)
echo "Response: $RESPONSE"
echo "$RESPONSE" | grep -q "{{ .Values.nginx.conf.message }}"
restartPolicy: Never
09. Manage Chart Dependencies¶
Create a new Helm chart that depends on the bitnami/redis chart as a subchart. Configure the dependency and update it.
Scenario:¶
β¦ Most real-world applications depend on databases, caches, or message queues. β¦ Helm dependencies let you compose complex deployments from reusable charts.
Hint: Add a dependencies section to Chart.yaml, then run helm dependency update.
Solution
# 2. Add dependencies to Chart.yaml (append at the end):
dependencies:
- name: redis
version: "~17"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
# 3. Add Redis configuration to values.yaml:
redis:
enabled: true
architecture: standalone
auth:
enabled: false
# 4. Add the Bitnami repo (if not already added)
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# 5. Download the dependency charts
helm dependency update .
# 6. Verify the dependency was downloaded
ls charts/
# 7. Preview the rendered output (Redis resources will be included)
helm template myapp-with-deps . | grep "kind:" | sort | uniq
# 8. Cleanup
cd ..
rm -rf myapp-with-deps
10. Create a Pre-Install Hook¶
Add a pre-install hook to the codewizard-helm-demo chart that creates a Job to print a banner message before the main resources are installed.
Scenario:¶
β¦ Hooks allow you to run setup, migration, or validation tasks at specific lifecycle points.
β¦ pre-install hooks run before the main chart resources are created.
Hint: Create a Job manifest with the annotation "helm.sh/hook": pre-install and "helm.sh/hook-delete-policy": hook-succeeded.
Solution
# Create templates/pre-install-hook.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "webserver.fullname" . }}-pre-install
namespace: codewizard
annotations:
"helm.sh/hook": pre-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
containers:
- name: pre-install
image: busybox
command: ['sh', '-c', 'echo "=== Installing {{ .Release.Name }} (Chart: {{ .Chart.Name }}-{{ .Chart.Version }}) ==="']
restartPolicy: Never
backoffLimit: 1
# Install and observe the hook
helm install codewizard-helm-demo codewizard-helm-demo-0.1.0.tgz
# The Job should run and complete before the main resources are created
# Check the jobs (it may already be cleaned up due to hook-delete-policy)
kubectl get jobs -n codewizard
# Cleanup
helm uninstall codewizard-helm-demo
rm codewizard-helm-demo/templates/pre-install-hook.yaml
11. Diff Before Upgrade¶
Use the helm-diff plugin to preview what changes an upgrade will make before applying it.
Scenario:¶
β¦ In production, blindly upgrading without reviewing changes is risky.
β¦ The helm-diff plugin shows a diff of what would change, similar to kubectl diff.
Hint: Install the plugin with helm plugin install, then use helm diff upgrade.
Solution
# 1. Install the helm-diff plugin
helm plugin install https://github.com/databus23/helm-diff
# 2. Make sure the release is installed
helm install codewizard-helm-demo codewizard-helm-demo-0.1.0.tgz
# 3. Preview what an upgrade would change (without applying)
helm diff upgrade codewizard-helm-demo codewizard-helm-demo-0.1.0.tgz \
--set nginx.conf.message="Updated Message" \
--set replicaCount=5
# The output shows colorized diff of what resources would change
# 4. If you're satisfied, apply the upgrade
helm upgrade codewizard-helm-demo codewizard-helm-demo-0.1.0.tgz \
--set nginx.conf.message="Updated Message" \
--set replicaCount=5
# Cleanup
helm uninstall codewizard-helm-demo
12. Create a NOTES.txt¶
Add a NOTES.txt file to the codewizard-helm-demo chart that displays the release name, namespace, and instructions for testing the service after installation.
Scenario:¶
β¦ NOTES.txt provides post-install guidance to users who install your chart.
β¦ It supports the same Go template syntax as other template files.
Hint: Create templates/NOTES.txt with template directives like {{ .Release.Name }}.
Solution
# Create templates/NOTES.txt with the following content:
==================================================
{{ .Chart.Name }} has been installed!
==================================================
Release Name : {{ .Release.Name }}
Namespace : codewizard
Revision : {{ .Release.Revision }}
Chart Version: {{ .Chart.Version }}
App Version : {{ .Chart.AppVersion }}
To test the service, run:
kubectl run busybox --image=busybox --rm -it --restart=Never \
-- /bin/sh -c "wget -qO- http://{{ include "webserver.fullname" . }}.codewizard.svc.cluster.local"
To check release status:
helm status {{ .Release.Name }}
To uninstall:
helm uninstall {{ .Release.Name }}
Finalize & Cleanup¶
- To remove all resources created by this lab, uninstall the
codewizard-helm-demorelease:
- (Optional) If you have created a dedicated namespace for this lab, you can delete it by running:
- (Optional) Remove added Helm repositories:
Troubleshooting¶
- Helm not found:
Make sure Helm is installed and available in your PATH.
Run the following to verify:
- Pods not starting:
Check pod status and logs by running the following commands:
kubectl get pods -n codewizard
kubectl describe pod <pod-name> -n codewizard
kubectl logs <pod-name> -n codewizard
- Service not reachable:
Ensure the service and pods are running by running the following commands:
- Values not updated after upgrade:
Double-check your --set or --values flags and confirm the upgrade by running:
- Template rendering errors:
Use helm template and helm lint to find and debug template issues:
- Hook failures:
Inspect hook resources (Jobs, Pods) to check their status and logs:
Next Steps¶
- Try creating your own
Helmchart for a different application. - Explore
Helmchart repositories like Artifact Hub. - Learn about advanced
Helmfeatures, such as: dependencies, hooks, and chart testing. - Explore Helmfile for declarative management of multiple Helm releases.
- Learn about Helm Secrets for managing sensitive data in charts. ````
ArgoCD¶
ArgoCDis a declarative, GitOps continuous delivery tool for Kubernetes.- It follows the GitOps pattern where a Git repository is the single source of truth for the desired application state.
ArgoCDautomates the deployment and reconciliation of applications against one or more Kubernetes clusters.
What will we learn?¶
- What
ArgoCDis and why it is useful for GitOps - How to install and configure
ArgoCDon Kubernetes using Helm ArgoCDcore concepts: Applications, Projects, Sync, and Health- How to expose ArgoCD via a Nginx Ingress
- How to deploy applications from Git repositories
- Application health and sync status monitoring
- Rollback and sync strategies
- The App of Apps pattern to manage multiple applications declaratively
- How to deploy the EFK stack (Lab 29) via ArgoCD
Official Documentation & References¶
| Resource | Link |
|---|---|
| ArgoCD Official Documentation | argo-cd.readthedocs.io |
| ArgoCD CLI Reference | argo-cd.readthedocs.io/cli |
| ArgoCD Helm Chart (Argo Helm Repo) | artifacthub.io |
| App of Apps Pattern | argo-cd.readthedocs.io/bootstrapping |
| Sync Waves & Phases | argo-cd.readthedocs.io/sync-waves |
| ApplicationSet Controller | argo-cd.readthedocs.io/applicationset |
| GitOps Principles | gitops.tech |
| ArgoCD GitHub Repository | github.com/argoproj/argo-cd |
What is ArgoCD?¶
ArgoCDis a pull-based GitOps operator: it watches Git and continuously reconciles the cluster state to match what is defined in Git.- It consists of a control plane (running in the cluster) and optionally a CLI for interacting with it.
- Unlike push-based CI/CD (Jenkins, GitHub Actions), no pipeline ever needs direct
kubectlaccess to the cluster.
Terminology¶
| Term | Description |
|---|---|
| Application | An ArgoCD resource linking a Git path to a target Kubernetes cluster+namespace |
| Project | A logical grouping of applications; controls permitted repos/clusters/namespaces |
| Sync | The process of applying Git state to the live cluster |
| Sync Status | Synced - live matches Git; OutOfSync - live differs from Git |
| Health Status | Healthy, Progressing, Degraded, Suspended, or Missing |
| App of Apps | A pattern where one ArgoCD Application manages a directory of child Application manifests |
ArgoCD Components¶
| Component | Description |
|---|---|
| API Server | Exposes REST/gRPC API consumed by Web UI, CLI, and CI/CD systems |
| Repository Server | Local cache of Git repositories; renders Helm, Kustomize, plain YAML |
| Application Controller | Monitors live cluster state and compares against desired state from Git |
| Dex | Identity service for integrating with external SSO providers |
| Redis | Caching layer; short-lived state between components |
Common ArgoCD CLI Commands¶
Below are the most common argocd CLI commands. Each command includes syntax, description, and detailed usage examples.
argocd login - Authenticate to ArgoCD server
Syntax: argocd login <server>
Description: Authenticates to an ArgoCD server and creates a local session used for subsequent CLI commands.
- Supports insecure mode (no TLS verification) for local/dev environments
- Can use username/password or SSO token
-
Session is stored in
~/.config/argocd/config# Login via Ingress argocd login argocd.local --insecure # Login via port-forward argocd login localhost:8080 --insecure # Login with explicit credentials argocd login argocd.local \ --username admin \ --password <password> \ --insecure # Login to a remote secured server argocd login argocd.example.com --grpc-web # Change the admin password after login argocd account update-password # Show current login/user info argocd account get-user-info
argocd app create - Create a new application
Syntax: argocd app create <name>
Description: Creates a new ArgoCD Application resource that links a Git repository path to a target Kubernetes cluster and namespace.
- Links a Git source to a Kubernetes destination
- Supports Helm charts, Kustomize, and plain YAML
-
Can enable automated sync policies at creation time
# Create from a plain YAML path in Git argocd app create guestbook \ --repo https://github.com/argoproj/argocd-example-apps.git \ --path guestbook \ --dest-server https://kubernetes.default.svc \ --dest-namespace guestbook # Create with automated sync, self-heal and auto-prune argocd app create guestbook \ --repo https://github.com/argoproj/argocd-example-apps.git \ --path guestbook \ --dest-server https://kubernetes.default.svc \ --dest-namespace guestbook \ --sync-policy automated \ --auto-prune \ --self-heal # Create from a Helm chart in a Git repo argocd app create my-helm-app \ --repo https://github.com/my-org/my-charts.git \ --path charts/my-app \ --dest-server https://kubernetes.default.svc \ --dest-namespace my-app \ --helm-set replicaCount=2 # Create from an OCI/Helm chart registry argocd app create nginx-helm \ --repo https://charts.bitnami.com/bitnami \ --helm-chart nginx \ --revision 15.1.0 \ --dest-server https://kubernetes.default.svc \ --dest-namespace default # Create and auto-create the namespace argocd app create my-app \ --repo https://github.com/my-org/my-repo.git \ --path manifests \ --dest-server https://kubernetes.default.svc \ --dest-namespace production \ --sync-option CreateNamespace=true
argocd app list - List all applications
Syntax: argocd app list
Description: Lists all ArgoCD applications and their current sync and health status.
- Shows application name, cluster, namespace, sync status, and health
- Supports filtering by project, sync status, and health status
-
Useful for a quick overview of all managed applications
# List all applications argocd app list # List applications in a specific project argocd app list -p my-project # Filter by sync status argocd app list --sync-status OutOfSync # Filter by health status argocd app list --health-status Degraded # Output as JSON argocd app list -o json # Output as YAML argocd app list -o yaml # Show only app names argocd app list -o name # Wide output with extra columns argocd app list -o wide
argocd app get - Get details of an application
Syntax: argocd app get <name>
Description: Shows detailed information about a specific ArgoCD application including sync status, health, and the managed resource tree.
- Displays sync and health status
- Shows all Kubernetes resources managed by the application
-
Use
--refreshto force-fetch the latest state from Git# Get application details argocd app get guestbook # Force refresh from Git before displaying argocd app get guestbook --refresh # Show resource tree argocd app get guestbook --output tree # Output as JSON argocd app get guestbook -o json # Output as YAML argocd app get guestbook -o yaml # Watch live status updates watch argocd app get guestbook
argocd app sync - Sync (deploy) an application
Syntax: argocd app sync <name>
Description: Triggers a sync operation that applies the desired Git state to the live cluster.
- Applies the desired state from Git to the cluster
- Supports dry-run mode to preview changes without applying them
-
Can force sync to replace resources or selectively sync specific resources
# Sync an application argocd app sync guestbook # Sync and wait for completion argocd app sync guestbook --timeout 120 # Dry-run - preview changes without applying argocd app sync guestbook --dry-run # Force sync (replace resources even if spec is unchanged) argocd app sync guestbook --force # Sync a specific resource only argocd app sync guestbook \ --resource apps:Deployment:guestbook-ui # Sync with prune (delete resources removed from Git) argocd app sync guestbook --prune # Sync multiple applications at once argocd app sync guestbook efk-stack app-of-apps
argocd app diff - Show diff between Git and live state
Syntax: argocd app diff <name>
Description: Shows the difference between the desired state in Git and the live state in the cluster. Useful for diagnosing configuration drift.
- Outputs a unified diff of desired vs live state
- Helps diagnose
OutOfSyncapplications before syncing -
Can compare against a specific Git revision
# Show diff for an application argocd app diff guestbook # Show diff and compare against a specific revision argocd app diff guestbook --revision HEAD~1 # Show diff for a specific resource type argocd app diff guestbook \ --resource apps:Deployment:guestbook-ui # Use in CI - exit non-zero if drift detected argocd app diff guestbook; echo "Exit code: $?"
argocd app set - Update application settings
Syntax: argocd app set <name> [flags]
Description: Updates the configuration of an existing ArgoCD application without deleting and recreating it.
- Modify source repository, path, revision, destination, or sync policy
- Enable or disable auto-sync, self-heal, and auto-prune
-
Override Helm values or add/remove sync options
# Enable automated sync argocd app set guestbook --sync-policy automated # Enable self-heal and auto-prune argocd app set guestbook --self-heal --auto-prune # Disable automated sync (switch to manual) argocd app set guestbook --sync-policy none # Change the target revision (branch, tag, or commit SHA) argocd app set guestbook --revision v1.2.0 # Override a Helm value argocd app set my-helm-app --helm-set replicaCount=3 # Add a sync option argocd app set guestbook --sync-option CreateNamespace=true # Change the target namespace argocd app set guestbook --dest-namespace new-namespace
argocd app history - Show deployment history
Syntax: argocd app history <name>
Description: Prints the deployment history for an application, listing every revision that has been deployed to the cluster.
- Shows revision ID, timestamp, Git commit SHA, and deploy status
- Use revision IDs from history to target a specific rollback
-
History is maintained by ArgoCD in its internal state store
argocd app rollback - Rollback to a previous revision
Syntax: argocd app rollback <name> <revision-id>
Description: Rolls back an application to a previously deployed revision. The revision ID is obtained from argocd app history.
- Reverts the live cluster to a previously deployed state
- Disables automated sync on the app to prevent re-syncing forward
-
The rollback operation is recorded as a new entry in history
# Check history to find the target revision ID argocd app history guestbook # Rollback to a specific revision argocd app rollback guestbook 3 # Rollback and wait for completion argocd app rollback guestbook 3 --timeout 120 # Re-enable auto-sync after rollback argocd app set guestbook --sync-policy automated # Verify the rollback succeeded argocd app get guestbook kubectl get all -n guestbook
argocd app delete - Delete an application
Syntax: argocd app delete <name>
Description: Deletes an ArgoCD application. By default performs a cascade delete, removing all managed Kubernetes resources along with the Application resource.
- Cascade delete removes all Kubernetes resources managed by the app
- Non-cascade delete removes only the ArgoCD Application resource itself
-
Use
--yesto skip the confirmation prompt in scripts# Delete an application (cascade - removes all K8s resources) argocd app delete guestbook # Delete without removing Kubernetes resources (non-cascade) argocd app delete guestbook --cascade=false # Skip confirmation prompt (useful in scripts) argocd app delete guestbook --yes # Delete multiple applications argocd app delete guestbook efk-stack --yes
argocd repo - Manage repositories
Syntax: argocd repo add <url> / argocd repo list
Description: Manages Git and Helm chart repositories connected to ArgoCD. ArgoCD must have access to a repository before it can deploy from it.
- Add public or private repositories
- Supports HTTPS tokens, SSH keys, and TLS certificates for authentication
-
List and remove existing repository connections
# List all connected repositories argocd repo list # Add a public HTTPS repository argocd repo add https://github.com/argoproj/argocd-example-apps.git # Add a private repository with an HTTPS token argocd repo add https://github.com/my-org/private-repo.git \ --username git \ --password <token> # Add a repository using an SSH key argocd repo add git@github.com:my-org/private-repo.git \ --ssh-private-key-path ~/.ssh/id_rsa # Add a Helm chart repository argocd repo add https://charts.bitnami.com/bitnami \ --type helm \ --name bitnami # Remove a repository argocd repo rm https://github.com/my-org/private-repo.git
argocd context - Manage server contexts
Syntax: argocd context [context-name]
Description: Manages multiple ArgoCD server connections (contexts), similar to kubectl config use-context. Useful when managing applications across multiple clusters or environments.
- Switch between multiple ArgoCD server connections
- Contexts are stored in
~/.config/argocd/config -
Each context holds a server address and authentication token
Architecture¶
graph TB
dev["Developer\npushes to Git"] --> git["Git Repository\n(Source of Truth)"]
subgraph cluster["Kubernetes Cluster"]
subgraph argocd["argocd namespace"]
api["ArgoCD API Server\n(argocd.local via Ingress)"]
ctrl["Application Controller"]
repo["Repository Server"]
dex["Dex (SSO)"]
redis["Redis (cache)"]
ingress["Nginx Ingress\nargocd.local"]
end
subgraph apps["Managed Namespaces"]
guestbook["guestbook namespace\nGuestbook App"]
efk["efk namespace\nEFK Stack"]
end
end
browser["Browser / argocd CLI"] --> ingress
ingress --> api
git --> repo
repo --> ctrl
ctrl --> guestbook
ctrl --> efk
api --> ctrl
api --> dex
ctrl --> redis
App of Apps Pattern¶
graph TD
root["app-of-apps\nwatches: Labs/18-ArgoCD/apps/"]
root --> guestbook["guestbook\nrepo: argocd-example-apps\nnamespace: guestbook"]
root --> efk["efk-stack\nwatches: Labs/33-EFK/argocd-apps/"]
efk --> es["efk-elasticsearch\nHelm chart\nwave: 0"]
efk --> fb["efk-filebeat\nHelm chart\nwave: 1"]
efk --> kb["efk-kibana\nHelm chart\nwave: 1"]
efk --> lg["efk-log-generator\nHelm chart\nwave: 2"]
efk --> lp["efk-log-processor\nHelm chart\nwave: 2"]
Directory Structure¶
18-ArgoCD/
βββ README.md # This file
βββ demo.sh # Full automated demo script
βββ ArgoCD.sh # Legacy install script (manual)
βββ install-argocd.sh # Install ArgoCD via kustomize (legacy)
βββ install.sh # Print admin password
βββ run-demo.sh # Run guestbook demo (legacy)
β
βββ manifests/
β βββ argocd-ingress.yaml # Nginx Ingress for ArgoCD UI
β
βββ apps/ # App of Apps - all YAML files here are managed
β βββ app-of-apps.yaml # Root application - points to this apps/ folder
β βββ guestbook.yaml # Guestbook demo application
β βββ efk-stack.yaml # EFK stack App of Apps (points to Lab 29)
β
βββ guestbook-app.yaml # Standalone guestbook application manifest
βββ kustomization.yaml # Kustomize patch for argocd-server --insecure
βββ patch-replace.yaml # Kustomize strategic merge patch
Prerequisites¶
- Kubernetes cluster (v1.24+)
kubectlconfigured to access your clusterHelm 3.xinstalled- Nginx Ingress Controller installed on the cluster
- (Optional)
argocdCLI
# Install kubectl (macOS)
brew install kubectl
# Install Helm
brew install helm
# Install argocd CLI (optional)
brew install argocd
# Install Nginx Ingress Controller (if not present)
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx --create-namespace
Lab¶
Part 01 - Full Automated Demo¶
The demo.sh script handles the complete lifecycle in one command: ArgoCD installation via Helm, Ingress setup, Guestbook deployment, and App of Apps deployment.
01. Run the Demo¶
The script will:
- Install ArgoCD via Helm (
argo/argo-cdchart) with--insecuremode - Apply the Nginx Ingress pointing
argocd.localto the ArgoCD server - Print admin credentials
- Deploy the Guestbook demo application and wait for it to sync
- Deploy the App of Apps, which triggers the EFK stack deployment from Lab 29
02. Other Commands¶
# Show current status of all applications
./demo.sh status
# Print admin username and password
./demo.sh credentials
# Remove all ArgoCD resources and managed apps
./demo.sh cleanup
Part 02 - Manual ArgoCD Installation¶
01. Install ArgoCD via Helm¶
# Add Argo Helm repository
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update argo
# Install ArgoCD (insecure mode - TLS handled by Ingress)
helm upgrade --install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--set server.insecure=true \
--wait
02. Verify Installation¶
Expected output (all pods Running):
NAME READY STATUS
argocd-application-controller-0 1/1 Running
argocd-dex-server-xxxx 1/1 Running
argocd-redis-xxxx 1/1 Running
argocd-repo-server-xxxx 1/1 Running
argocd-server-xxxx 1/1 Running
03. Get Admin Password¶
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d; echo
Note
Save this password - you’ll need it to log in to the ArgoCD UI and CLI.
Part 03 - Expose ArgoCD via Ingress¶
An Nginx Ingress allows you to access the ArgoCD UI at http://argocd.local instead of requiring port-forwarding.
Prerequisite
Nginx Ingress Controller must be installed in the cluster.
01. Apply the Ingress¶
The Ingress forwards HTTP traffic to the ArgoCD server which runs in --insecure mode (no TLS at the pod level):
# manifests/argocd-ingress.yaml (summary)
metadata:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
ingressClassName: nginx
rules:
- host: argocd.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: argocd-server
port:
number: 80
02. Add to /etc/hosts¶
# Get the node IP
INGRESS_IP=$(kubectl get nodes \
-o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
# Add entry
echo "${INGRESS_IP} argocd.local" | sudo tee -a /etc/hosts
# Open the UI
open http://argocd.local
03. Port-Forward Fallback¶
If Ingress is not available:
Part 04 - ArgoCD CLI¶
01. Install the CLI¶
# macOS
brew install argocd
# Linux
curl -sSL -o argocd-linux-amd64 \
https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
02. Login¶
# Via Ingress
argocd login argocd.local --insecure
# Via port-forward
argocd login localhost:8080 --insecure
# Change admin password (recommended)
argocd account update-password
Part 05 - Deploying the Guestbook Application¶
01. Create via Manifest¶
02. Create via CLI¶
argocd app create guestbook \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace guestbook \
--sync-policy automated \
--auto-prune \
--self-heal
03. Monitor Sync¶
04. Access the Guestbook¶
Part 06 - App of Apps Pattern¶
The App of Apps pattern uses a single root ArgoCD Application as a controller that watches a Git directory for Application manifests. Adding an app is as simple as committing a new YAML file.
GitOps Flow¶
sequenceDiagram
participant Dev as Developer
participant Git as Git Repository
participant ArgoCD as ArgoCD
participant K8s as Kubernetes
Dev->>Git: git push (new app YAML in apps/)
ArgoCD->>Git: Poll for changes (every 3 min)
Git-->>ArgoCD: Detects new Application manifest
ArgoCD->>K8s: Creates child Application resource
ArgoCD->>Git: Fetches manifests from child app's source
Git-->>ArgoCD: Returns Helm/Kustomize/YAML manifests
ArgoCD->>K8s: Deploys resources to target namespace
K8s-->>ArgoCD: Reports health status
01. Deploy the App of Apps¶
ArgoCD will:
- Detect the
apps/directory in the repo - Create a child Application for each
.yamlfile found there - Each child Application then deploys its own resources
02. Verify All Applications¶
Expected output:
NAME CLUSTER NAMESPACE STATUS HEALTH SYNCPOLICY
app-of-apps in-cluster argocd Synced Healthy Auto-Prune
efk-stack in-cluster argocd Synced Healthy Auto-Prune
efk-elasticsearch in-cluster efk Synced Healthy Auto-Prune
efk-filebeat in-cluster efk Synced Healthy Auto-Prune
efk-kibana in-cluster efk Synced Healthy Auto-Prune
efk-log-generator in-cluster efk Synced Healthy Auto-Prune
efk-log-processor in-cluster efk Synced Healthy Auto-Prune
guestbook in-cluster guestbook Synced Healthy Auto-Prune
03. Add a New Application¶
To add a new application, commit a new manifest to apps/:
cat > apps/my-new-app.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-new-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/my-org/my-repo.git
targetRevision: HEAD
path: manifests
destination:
server: https://kubernetes.default.svc
namespace: my-app
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
git add apps/my-new-app.yaml
git commit -m "feat: add my-new-app to App of Apps"
git push
# ArgoCD automatically detects and deploys the new application
Part 07 - Auto-Sync and Self-Healing¶
Enable Auto-Sync¶
Test Self-Healing¶
# Manually break the desired state
kubectl scale deployment guestbook-ui --replicas=5 -n guestbook
# ArgoCD detects the drift and restores the desired replica count from Git
watch argocd app get guestbook
Part 08 - Rollback¶
# View deployment history
argocd app history guestbook
# Rollback to a specific revision
argocd app rollback guestbook <revision-id>
# Verify the rollback
argocd app get guestbook
kubectl get all -n guestbook
Part 09 - Sync Waves (Deployment Ordering)¶
Sync waves control the order in which resources are applied during a sync. Resources in wave N are applied only after all resources in wave N-1 are healthy. The EFK App of Apps uses waves to ensure Elasticsearch is ready before Filebeat and Kibana start.
# wave 0: Elasticsearch (must be ready first)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "0"
# wave 1: Filebeat and Kibana (require Elasticsearch)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "1"
# wave 2: Log Generator and Processor (require everything else)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "2"
Part 10 - Working with Helm Charts¶
# Deploy a Helm chart from a registry
argocd app create nginx-helm \
--repo https://charts.bitnami.com/bitnami \
--helm-chart nginx \
--revision 15.1.0 \
--dest-server https://kubernetes.default.svc \
--dest-namespace default \
--helm-set service.type=NodePort \
--helm-set replicaCount=3
argocd app sync nginx-helm
Part 11 - Troubleshooting¶
ArgoCD Server Not Accessible¶
# Check ArgoCD pods
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server
# Check Ingress
kubectl get ingress -n argocd
kubectl describe ingress argocd-server-ingress -n argocd
Application Stuck in Progressing¶
argocd app get <app-name>
kubectl describe application <app-name> -n argocd
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller --tail=50
Out of Sync / Repository Error¶
# Show the diff
argocd app diff <app-name>
# Force refresh from Git
argocd app get <app-name> --refresh
# Force sync
argocd app sync <app-name> --force
App of Apps Not Creating Child Apps¶
# Check root app is synced
argocd app get app-of-apps
# Check ArgoCD can reach the repo
argocd repo list
# Check the apps/ directory exists in the configured path
argocd app manifests app-of-apps
Cleanup¶
# Full cleanup via demo.sh
./demo.sh cleanup
# Or manually
kubectl delete applications --all -n argocd
helm uninstall argocd --namespace argocd
kubectl delete namespace argocd guestbook efk
Part 12 - End-to-End: From Helm Chart to GitOps with ArgoCD¶
This part walks through the complete lifecycle in baby steps - starting from zero, building a Helm chart from scratch, installing ArgoCD, and then managing the chart purely through GitOps.
By the end you will have a working ArgoCD setup where every git push automatically deploys your Helm chart.
flowchart LR
A["Step 1-3\nCreate Helm Chart"] --> B["Step 4-5\nValidate & Commit to Git"]
B --> C["Step 6-7\nInstall ArgoCD"]
C --> D["Step 8-9\nConfigure & Login"]
D --> E["Step 10-11\nCreate ArgoCD App"]
E --> F["Step 12-13\nGitOps in Action"]
F --> G["Step 14\nApp of Apps"]
Step 01 - Install Prerequisites¶
Install the required tools before starting.
Verify tools are ready¶
Expected output (versions may differ):
Step 02 - Create a Helm Chart from Scratch¶
We will build the my-webserver chart - a simple nginx-based web server.
02.01 Scaffold the chart¶
# Create a fresh chart skeleton
helm create my-webserver
# Inspect the generated structure
find my-webserver -type f | sort
You will see:
my-webserver/
βββ Chart.yaml # Chart metadata (name, version, appVersion)
βββ values.yaml # Default configuration values
βββ charts/ # Subchart dependencies (empty for now)
βββ templates/
βββ _helpers.tpl # Reusable named templates
βββ deployment.yaml # Deployment resource template
βββ hpa.yaml # HorizontalPodAutoscaler (optional)
βββ ingress.yaml # Ingress resource (optional)
βββ NOTES.txt # Post-install message shown to user
βββ service.yaml # Service resource template
βββ serviceaccount.yaml
βββ tests/
βββ test-connection.yaml
02.02 Clean up the scaffold (keep only what we need)¶
# Remove files we will not use in this demo
rm my-webserver/templates/hpa.yaml
rm my-webserver/templates/serviceaccount.yaml
rm my-webserver/templates/ingress.yaml
02.03 Rewrite Chart.yaml¶
Replace the contents of my-webserver/Chart.yaml with:
apiVersion: v2
name: my-webserver
description: A simple nginx web server - Helm + ArgoCD demo
type: application
version: 1.0.0
appVersion: "1.25.3"
version vs appVersion
versionis the chart version - bump it every time you release a new chart.appVersionis the version of the application (nginx image tag) shipped inside the chart.
02.04 Rewrite values.yaml¶
Replace the contents of my-webserver/values.yaml with a minimal, well-documented set of values:
# Number of nginx pods to run
replicaCount: 1
image:
repository: nginx
tag: "1.25.3"
pullPolicy: IfNotPresent
# The HTML content served by nginx
greeting: "Hello from my Helm chart + ArgoCD!"
service:
type: ClusterIP
port: 80
# Resource limits/requests
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
02.05 Rewrite the Deployment template¶
Replace the contents of my-webserver/templates/deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-webserver.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "my-webserver.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "my-webserver.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "my-webserver.selectorLabels" . | nindent 8 }}
spec:
containers:
- name: nginx
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 80
# Inject a simple HTML index from the ConfigMap
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumes:
- name: html
configMap:
name: {{ include "my-webserver.fullname" . }}-html
02.06 Add a ConfigMap template¶
Create my-webserver/templates/configmap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "my-webserver.fullname" . }}-html
namespace: {{ .Release.Namespace }}
labels:
{{- include "my-webserver.labels" . | nindent 4 }}
data:
index.html: |
<!DOCTYPE html>
<html>
<head><title>{{ .Values.greeting }}</title></head>
<body>
<h1>{{ .Values.greeting }}</h1>
<p>Release: <strong>{{ .Release.Name }}</strong> |
Revision: <strong>{{ .Release.Revision }}</strong> |
Namespace: <strong>{{ .Release.Namespace }}</strong></p>
</body>
</html>
02.07 Update NOTES.txt¶
Replace my-webserver/templates/NOTES.txt:
π {{ .Chart.Name }} v{{ .Chart.Version }} installed as release "{{ .Release.Name }}"
Quick access:
kubectl port-forward svc/{{ include "my-webserver.fullname" . }} 8080:80 -n {{ .Release.Namespace }}
Open http://localhost:8080 in your browser
To check pod status:
kubectl get pods -n {{ .Release.Namespace }} -l app.kubernetes.io/instance={{ .Release.Name }}
Step 03 - Validate the Chart¶
Always validate a chart before committing or installing it.
03.01 Lint¶
Expected: 1 chart(s) linted, 0 chart(s) failed
03.02 Render templates locally (dry-run without a cluster)¶
# Render to stdout and inspect what will be applied
helm template my-release my-webserver --namespace my-ns
# Render to a file for closer inspection
helm template my-release my-webserver \
--namespace my-ns \
--output-dir /tmp/rendered-my-webserver
ls /tmp/rendered-my-webserver/my-webserver/templates/
03.03 Test override values locally¶
# Preview with 3 replicas and a custom greeting
helm template my-release my-webserver \
--set replicaCount=3 \
--set greeting="GitOps is awesome!"
Look for replicas: 3 and your custom greeting in the output.
Step 04 - Install the Chart Locally (Optional Smoke Test)¶
Before hooking things up to ArgoCD, confirm the chart deploys correctly.
# Install into a dedicated namespace
helm upgrade --install my-webserver my-webserver \
--namespace my-webserver \
--create-namespace \
--wait
# Verify pods are running
kubectl get all -n my-webserver
# Quick curl test via port-forward
kubectl port-forward svc/my-webserver 8080:80 -n my-webserver &
sleep 2
curl -s http://localhost:8080 | grep 'Hello'
# Stop the port-forward
kill %1
If the page shows your greeting, the chart is working. Uninstall before handing over to ArgoCD:
Step 05 - Commit the Chart to Git¶
ArgoCD is a pull-based GitOps tool - it watches a Git repository and deploys whatever is there. Your chart must live in a Git repository that ArgoCD can reach.
05.01 Recommended directory layout inside the repo¶
my-repo/
βββ charts/
β βββ my-webserver/ β the Helm chart we just created
β βββ Chart.yaml
β βββ values.yaml
β βββ templates/
βββ argocd/
βββ my-webserver-app.yaml β ArgoCD Application manifest (added in Step 10)
05.02 Commit and push¶
# From the root of your Git repository
mkdir -p charts
cp -r my-webserver charts/
git add charts/my-webserver/
git commit -m "feat: add my-webserver Helm chart v1.0.0"
git push
Using this KubernetesLabs repo
If you are working inside the KubernetesLabs repository, place your chart under Labs/18-ArgoCD/charts/my-webserver/ so it is already reachable via https://github.com/nirgeier/KubernetesLabs.
Step 06 - Install ArgoCD via Helm¶
Now we install ArgoCD itself into the cluster.
06.01 Add the Argo Helm repository¶
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update argo
# Confirm the chart is available
helm search repo argo/argo-cd
06.02 Install ArgoCD¶
helm upgrade --install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--set server.insecure=true \
--wait
--set server.insecure=true
This disables TLS at the ArgoCD server pod so that a plain HTTP Ingress works without certificate configuration. In production you should terminate TLS at the Ingress instead.
06.03 Verify all pods are Running¶
Expected (all 1/1 Running):
NAME READY STATUS
argocd-application-controller-0 1/1 Running
argocd-applicationset-controller-xxxx 1/1 Running
argocd-dex-server-xxxx 1/1 Running
argocd-notifications-controller-xxxx 1/1 Running
argocd-redis-xxxx 1/1 Running
argocd-repo-server-xxxx 1/1 Running
argocd-server-xxxx 1/1 Running
Step 07 - Expose the ArgoCD UI¶
Choose one of the methods below depending on your environment.
Option A - Port-Forward (simplest, no Ingress needed)¶
kubectl port-forward svc/argocd-server -n argocd 8080:80 &
echo "ArgoCD UI β http://localhost:8080"
Option B - Nginx Ingress (persistent URL)¶
Requires Nginx Ingress Controller
Install it first if not present:
Apply the Ingress manifest already in this lab:
Add the cluster IP to /etc/hosts:
INGRESS_IP=$(kubectl get nodes \
-o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
echo "${INGRESS_IP} argocd.local" | sudo tee -a /etc/hosts
echo "ArgoCD UI β http://argocd.local"
Step 08 - Retrieve the Admin Password¶
ArgoCD generates a random admin password on first install. Retrieve it with:
# Decode and print on a single line
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d; echo
Save this password
Store it somewhere safe before proceeding. You will use it to log in to the Web UI and the CLI.
Step 09 - Log in to ArgoCD¶
09.01 Web UI¶
Open the URL from Step 07 in your browser.
Username: admin
Password: (the password from Step 08)
09.02 ArgoCD CLI¶
# Via port-forward (Option A)
argocd login localhost:8080 \
--username admin \
--password "$(kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath='{.data.password}' | base64 -d)" \
--insecure
# Via Ingress (Option B)
argocd login argocd.local \
--username admin \
--password "$(kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath='{.data.password}' | base64 -d)" \
--insecure
09.03 (Recommended) Change the admin password¶
09.04 Verify login¶
Step 10 - Create an ArgoCD Application for the Helm Chart¶
There are three equivalent ways to create an ArgoCD Application. We will cover all three so you understand what each does.
Method A - ArgoCD CLI¶
argocd app create my-webserver \
--repo https://github.com/nirgeier/KubernetesLabs.git \
--path Labs/18-ArgoCD/charts/my-webserver \
--dest-server https://kubernetes.default.svc \
--dest-namespace my-webserver \
--helm-set replicaCount=2 \
--sync-policy automated \
--auto-prune \
--self-heal \
--sync-option CreateNamespace=true
| Flag | What it does |
|---|---|
--repo |
Git repository URL |
--path |
Path inside the repo where the chart lives |
--dest-namespace |
The Kubernetes namespace to deploy into |
--helm-set |
Override a chart value (same as --set in helm install) |
--sync-policy automated |
ArgoCD will automatically apply every Git change |
--auto-prune |
Delete resources removed from Git |
--self-heal |
Restore any manual cluster changes back to Git state |
--sync-option CreateNamespace=true |
Create the namespace if it does not exist |
Method B - Kubernetes Manifest¶
Prefer this method
Manifests are version-controlled, repeatable, and fit perfectly into the App of Apps pattern.
Create argocd/my-webserver-app.yaml in your repo with:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-webserver
namespace: argocd
# Ensures child resources are deleted when this Application is deleted
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/nirgeier/KubernetesLabs.git
targetRevision: HEAD # Track the default branch
path: Labs/18-ArgoCD/charts/my-webserver
helm:
# Override chart values directly in the Application manifest
values: |
replicaCount: 2
greeting: "Deployed by ArgoCD!"
destination:
server: https://kubernetes.default.svc
namespace: my-webserver
syncPolicy:
automated:
prune: true # Remove resources deleted from Git
selfHeal: true # Revert manual cluster changes
syncOptions:
- CreateNamespace=true
Apply it:
Method C - ArgoCD Web UI¶
- Click ”+ NEW APP” in the top-left of the UI.
- Fill in:
- Application Name:
my-webserver - Project:
default - Sync Policy:
Automaticβ Prune Resources β Self Heal - Source:
- Repository URL:
https://github.com/nirgeier/KubernetesLabs.git - Revision:
HEAD - Path:
Labs/18-ArgoCD/charts/my-webserver - Destination:
- Cluster URL:
https://kubernetes.default.svc - Namespace:
my-webserver - Helm section expands automatically since ArgoCD detects the chart.
Add an override:
replicaCount=2 - Click CREATE.
Step 11 - Watch the Sync and Verify the Deployment¶
11.01 Monitor sync via CLI¶
Expected output:
Name: argocd/my-webserver
Project: default
Sync Status: Synced
Health Status: Healthy
GROUP KIND NAMESPACE NAME STATUS HEALTH
Namespace my-webserver my-webserver Synced Healthy
ConfigMap my-webserver my-webserver-html Synced Healthy
apps Deployment my-webserver my-webserver Synced Healthy
Service my-webserver my-webserver Synced Healthy
11.02 Verify Kubernetes resources¶
Expected:
NAME READY STATUS RESTARTS
pod/my-webserver-xxxx 1/1 Running 0
pod/my-webserver-xxxx 1/1 Running 0
NAME TYPE CLUSTER-IP PORT(S)
service/my-webserver ClusterIP 10.x.x.x 80/TCP
NAME READY UP-TO-DATE AVAILABLE
deployment.apps/my-webserver 2/2 2 2
11.03 Test the application¶
kubectl port-forward svc/my-webserver 8080:80 -n my-webserver &
sleep 2
curl -s http://localhost:8080 | grep 'Deployed by ArgoCD'
# Stop port-forward
kill %1
Step 12 - GitOps in Action: Make a Change via Git¶
This is the key GitOps moment - you never run kubectl or helm upgrade.
Instead, you push a change to Git and ArgoCD applies it automatically.
12.01 Update the chart values in Git¶
Open charts/my-webserver/values.yaml (in your repo) and change:
# Before
replicaCount: 1
greeting: "Hello from my Helm chart + ArgoCD!"
# After
replicaCount: 3
greeting: "Updated via GitOps - no kubectl needed!"
Commit and push:
git add charts/my-webserver/values.yaml
git commit -m "feat: scale to 3 replicas and update greeting"
git push
12.02 Watch ArgoCD detect and apply the change¶
# ArgoCD polls Git every 3 minutes by default.
# You can trigger an immediate refresh:
argocd app get my-webserver --refresh
# Then watch the sync happen:
watch argocd app get my-webserver
Within seconds of the refresh ArgoCD will:
1. Detect the diff between Git (3 replicas) and the cluster (2 replicas)
2. Apply the updated Deployment
3. Report Synced + Healthy once the 3rd pod is running
12.03 Verify the change¶
# Should show 3/3 ready
kubectl get deployment my-webserver -n my-webserver
# Test the new greeting
kubectl port-forward svc/my-webserver 8080:80 -n my-webserver &
sleep 2
curl -s http://localhost:8080 | grep 'GitOps'
kill %1
12.04 Test Self-Healing¶
ArgoCD’s self-heal feature will restore any manual change that diverges from Git state.
# Manually scale to 1 replica (simulating an accidental change)
kubectl scale deployment my-webserver --replicas=1 -n my-webserver
# ArgoCD immediately detects the drift
argocd app get my-webserver --refresh
# Within ~15-30 seconds ArgoCD restores 3 replicas
watch kubectl get pods -n my-webserver
You should see the replicas jump back from 1 β 3 automatically.
Step 13 - Bump the Chart Version and Upgrade¶
When you change the chart templates themselves (not just values), bump the chart version.
13.01 Add a new label to all resources¶
Open my-webserver/templates/_helpers.tpl and add a environment label to the my-webserver.labels template:
{{/*
Common labels
*/}}
{{- define "my-webserver.labels" -}}
helm.sh/chart: {{ include "my-webserver.chart" . }}
{{ include "my-webserver.selectorLabels" . }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
environment: {{ .Values.environment | default "dev" }}
{{- end }}
13.02 Add environment to values.yaml¶
13.03 Bump the chart version in Chart.yaml¶
13.04 Commit, push, and let ArgoCD upgrade¶
git add charts/my-webserver/
git commit -m "feat: add environment label, bump chart to v1.1.0"
git push
ArgoCD detects the new chart version and runs a rolling upgrade.
13.05 Check the Helm history ArgoCD tracks¶
Step 14 - Rollback via ArgoCD¶
If a deployment breaks production you can roll back to any previous revision in seconds.
# 1. See what revisions are available
argocd app history my-webserver
# Output example:
# ID DATE REVISION INITIATOR
# 1 2026-02-01 abc1234 automated
# 2 2026-02-10 def5678 automated
# 3 2026-02-22 fed9876 automated
# 2. Roll back to revision 2
argocd app rollback my-webserver 2
# 3. Verify the rollback
argocd app get my-webserver
kubectl get deployment my-webserver -n my-webserver -o wide
Rollback and Auto-Sync
Rollback disables automated sync to prevent ArgoCD from immediately re-applying the newer Git state. Re-enable it when you are ready:
Step 15 - App of Apps: Managing Multiple Helm Charts Declaratively¶
Once you have more than one application, use the App of Apps pattern so every child application is itself version-controlled and managed by ArgoCD.
15.01 Create a second chart (optional)¶
# Clone or duplicate my-webserver as my-api
cp -r charts/my-webserver charts/my-api
sed -i '' 's/my-webserver/my-api/g' charts/my-api/Chart.yaml
sed -i '' 's/my-webserver/my-api/g' charts/my-api/values.yaml
15.02 Create Application manifests for each service¶
Create argocd/apps/my-webserver-app.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-webserver
namespace: argocd
annotations:
# Deploy only after a wave-0 infrastructure app is healthy (if needed)
argocd.argoproj.io/sync-wave: "1"
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/nirgeier/KubernetesLabs.git
targetRevision: HEAD
path: Labs/18-ArgoCD/charts/my-webserver
helm:
values: |
replicaCount: 2
greeting: "Front-end service"
environment: "production"
destination:
server: https://kubernetes.default.svc
namespace: my-webserver
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Create argocd/apps/my-api-app.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-api
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "1"
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/nirgeier/KubernetesLabs.git
targetRevision: HEAD
path: Labs/18-ArgoCD/charts/my-api
helm:
values: |
replicaCount: 1
greeting: "API service"
environment: "production"
destination:
server: https://kubernetes.default.svc
namespace: my-api
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
15.03 Create the root App of Apps manifest¶
Create argocd/app-of-apps.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: app-of-apps
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/nirgeier/KubernetesLabs.git
targetRevision: HEAD
# ArgoCD will read EVERY .yaml file in this directory
# and create an Application resource for each one
path: Labs/18-ArgoCD/argocd/apps
destination:
server: https://kubernetes.default.svc
namespace: argocd # child Application CRs live in argocd ns
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
15.04 Commit everything and bootstrap¶
Apply only the root application - ArgoCD takes care of the rest:
15.05 Watch all applications appear¶
Expected:
NAME CLUSTER NAMESPACE STATUS HEALTH SYNCPOLICY
app-of-apps in-cluster argocd Synced Healthy Auto-Prune
my-api in-cluster my-api Synced Healthy Auto-Prune
my-webserver in-cluster my-webserver Synced Healthy Auto-Prune
15.06 Add a new application - zero extra operators needed¶
From now on, adding any new Helm chart is just:
- Add the chart under
charts/ - Add an
Applicationmanifest underargocd/apps/ git push
ArgoCD automatically detects the new file and deploys it. No helm install, no kubectl apply needed.
Quick Reference: Helm + ArgoCD Workflow Cheatsheet¶
| Goal | Command |
|---|---|
| Create chart skeleton | helm create <name> |
| Render templates locally | helm template <release> <chart> |
| Validate chart | helm lint <chart> |
| Preview upgrade diff | argocd app diff <app> |
| Trigger immediate sync | argocd app sync <app> |
| Watch sync status | watch argocd app get <app> |
| List all apps | argocd app list |
| Roll back to revision N | argocd app rollback <app> <N> |
| Show deployment history | argocd app history <app> |
| Force refresh from Git | argocd app get <app> --refresh |
| Pause auto-sync | argocd app set <app> --sync-policy none |
| Resume auto-sync | argocd app set <app> --sync-policy automated --self-heal |
| Delete app + resources | argocd app delete <app> |
Part 12 Cleanup¶
# Delete all managed applications
argocd app delete app-of-apps --cascade # removes all child apps too
argocd app delete my-webserver --cascade
argocd app delete my-api --cascade
# Remove the namespaces
kubectl delete namespace my-webserver my-api
# Uninstall ArgoCD itself
helm uninstall argocd --namespace argocd
kubectl delete namespace argocd
Helm Operator¶
- An in-depth Helm-based operator tutorial.
- The
Helm Operatoris a Kubernetes operator, allowing one to declaratively manage Helm chart releases.
What will we learn?¶
- What the Helm Operator is and how it works
- How to create a Helm-based operator using
operator-sdk - How to customize the operator logic and deploy it to a cluster
- How to manage Helm chart releases declaratively through Custom Resources
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster- Docker installed (for building operator images)
operator-sdkCLI installed (steps below)
Install operator-sdk¶
# Grab the ARCH and OS
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)
export OS=$(uname | awk '{print tolower($0)}')
# Get the desired download URL
export OPERATOR_SDK_DL_URL=https://github.com/operator-framework/operator-sdk/releases/download/v1.23.0
# Download the Operator binaries
curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}
# Install the release binary in your PATH
chmod +x operator-sdk_${OS}_${ARCH} && sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk
Step 01 - Create a new project¶
- Use the CLI to create a new Helm-based nginx-operator project:
# Create the desired folder
mkdir nginx-operator
# Switch to the desired folder
cd nginx-operator
# Create the helm operator
operator-sdk \
init \
--kind Nginx \
--group demo \
--plugins helm \
--version v1alpha1 \
--domain codewizard.co.il
- This creates the
nginx-operatorproject specifically for watching theNginxresource with APIVersiondemo.codewizard.co.il/v1alpha1and KindNginx.
Operator SDK Project Layout¶
- The command will generate the following structure:
.
βββ Dockerfile
βββ Makefile
βββ PROJECT
βββ config
βΒ Β βββ crd
βΒ Β βΒ Β βββ bases
βΒ Β βΒ Β βΒ Β βββ demo.codewizard.co.il_nginxes.yaml
βΒ Β βΒ Β βββ kustomization.yaml
βΒ Β βββ default
βΒ Β βΒ Β βββ kustomization.yaml
βΒ Β βΒ Β βββ manager_auth_proxy_patch.yaml
βΒ Β βΒ Β βββ manager_config_patch.yaml
βΒ Β βββ manager
βΒ Β βΒ Β βββ controller_manager_config.yaml
βΒ Β βΒ Β βββ kustomization.yaml
βΒ Β βΒ Β βββ manager.yaml
βΒ Β βββ manifests
βΒ Β βΒ Β βββ kustomization.yaml
βΒ Β βββ prometheus
βΒ Β βΒ Β βββ kustomization.yaml
βΒ Β βΒ Β βββ monitor.yaml
βΒ Β βββ rbac
βΒ Β βΒ Β βββ auth_proxy_client_clusterrole.yaml
βΒ Β βΒ Β βββ auth_proxy_role.yaml
βΒ Β βΒ Β βββ auth_proxy_role_binding.yaml
βΒ Β βΒ Β βββ auth_proxy_service.yaml
βΒ Β βΒ Β βββ kustomization.yaml
βΒ Β βΒ Β βββ leader_election_role.yaml
βΒ Β βΒ Β βββ leader_election_role_binding.yaml
βΒ Β βΒ Β βββ nginx_editor_role.yaml
βΒ Β βΒ Β βββ nginx_viewer_role.yaml
βΒ Β βΒ Β βββ role.yaml
βΒ Β βΒ Β βββ role_binding.yaml
βΒ Β βΒ Β βββ service_account.yaml
βΒ Β βββ samples
βΒ Β βΒ Β βββ demo_v1alpha1_nginx.yaml
βΒ Β βΒ Β βββ kustomization.yaml
βΒ Β βββ scorecard
βΒ Β βββ bases
βΒ Β βΒ Β βββ config.yaml
βΒ Β βββ kustomization.yaml
βΒ Β βββ patches
βΒ Β βββ basic.config.yaml
βΒ Β βββ olm.config.yaml
βββ helm-charts
βΒ Β βββ nginx
βΒ Β βββ Chart.yaml
βΒ Β βββ templates
βΒ Β βΒ Β βββ NOTES.txt
βΒ Β βΒ Β βββ _helpers.tpl
βΒ Β βΒ Β βββ deployment.yaml
βΒ Β βΒ Β βββ hpa.yaml
βΒ Β βΒ Β βββ ingress.yaml
βΒ Β βΒ Β βββ service.yaml
βΒ Β βΒ Β βββ serviceaccount.yaml
βΒ Β βΒ Β βββ tests
βΒ Β βΒ Β βββ test-connection.yaml
βΒ Β βββ values.yaml
βββ tree.txt
βββ watches.yaml
16 directories, 44 files
Step 02 - Customize the operator logic¶
- For this example the nginx-operator will execute the following reconciliation logic for each Nginx Custom Resource (CR):
- Create an nginx Deployment, if it doesnβt exist.
- Create an nginx Service, if it doesnβt exist.
- Create an nginx Ingress, if it is enabled and doesnβt exist.
- Update the Deployment, Service, and Ingress, if they already exist but donβt match the desired configuration as specified by the Nginx CR.
- Ensure that the Deployment, Service, and optional Ingress all match the desired configuration (e.g. replica count, image, service type, etc) as specified by the Nginx CR.
Watch the Nginx CR¶
- By default, the Nginx-operator watches Nginx resource events as shown in
watches.yamland executes Helm releases using the specified chart:
# Use the 'create api' subcommand to add watches to this file.
- group: demo
version: v1alpha1
kind: Nginx
chart: helm-charts/nginx
Reviewing the Nginx Helm Chart¶
-
When a Helm operator project is created, the SDK creates an example Helm chart that contains a set of templates for a simple Nginx release.
-
For this example, we have templates for deployment, service, and ingress resources, along with a
NOTES.txttemplate, which Helm chart developers use to convey helpful information about a release.
Understanding the Nginx CR spec¶
-
Helm uses a concept called
valuesto provide customizations to a Helm chartβs defaults, which are defined in the Helm chartβsvalues.yamlfile. -
Overriding these defaults is as simple as setting the desired values in the CR spec.
-
Letβs use the number of replicas value as an example.
-
First, inspecting
helm-charts/nginx/values.yaml, we can see that the chart has a value calledreplicaCountand it is set to1by default. -
Letβs update the value to 3 -
replicaCount: 3.
# Update `config/samples/demo_v1alpha1_nginx.yaml` to look like the following:
apiVersion: demo.codewizard.co.il/v1alpha1
kind: Nginx
metadata:
name: nginx-sample
spec:
#... (Around line 33)
replicaCount: 3 # <------- Adding our replicas count
- Similarly, we see that the default service port is set to
80, but we would like to use8888, so we will again update config/samples/demo_v1alpha1_nginx.yaml by adding the service port override.
# Update `config/samples/demo_v1alpha1_nginx.yaml` to look like the following:
apiVersion: demo.codewizard.co.il/v1alpha1
kind: Nginx
metadata:
name: nginx-sample
spec:
#... (Around line 36)
service:
port: 8888 # <------- Updating our service port
Step 03 - Build the operatorβs image¶
# Login to your DockerHub / acr / ecr or any other registry account
# Set the desired image name and tag
# In the Makefile update the following line
# Image URL to use all building/pushing image targets
IMG ?= controller:latest
# change it to your registry account
IMG ?= nirgeier/helm_operator:latest
- Now let’s build and push the image:
Step 04 - Deploy the operator to the cluster¶
Step 05 - Create the custom Nginx¶
# Deploy the custom nginx we created earlier
kubectl apply -f config/samples/demo_v1alpha1_nginx.yaml
# Ensure that the nginx-operator created
kubectl get deployment | grep nginx-sample
# Check that we have 3 replicas as defined earlier
kubectl get pods | grep nginx-sample
# Check that the port is set to 8888
kubectl get svc | grep nginx-sample
Step 06 - Check the operator logic¶
# Update the replicaCount and remove the port
# Once we update the yaml we will check that the operator is working
# and updating the desired values
# Update the replicaCount in `config/samples/demo_v1alpha1_nginx.yaml`
replicaCount: 5
# Remark the service section in the yaml file
# We wish to see that the operator will use the default values
36 #service:
37 # port: 8888
38 # type: ClusterIP
- Apply the changes:
- Check to see that the operator is working as expected:
# Ensure that the nginx-operator still running
kubectl get deployment | grep nginx-sample
# Deploy the custom nginx we created earlier
kubectl apply -f config/samples/demo_v1alpha1_nginx.yaml
# Check that we have 5 replicas as defined earlier
kubectl get pods | grep nginx-sample
# Check that the port is set back to its default (80)
kubectl get svc | grep nginx-sample
Step 07 - Logging / Debugging¶
- We can view the operator’s logs using the following command:
# View the operator logs
kubectl logs deployment.apps/nginx-operator-controller-manager -n nginx-operator-system -c manager
- Review the CR status and events:
Kubeapps - Application Dashboard for Kubernetes¶

- Welcome to the
Kubeappshands-on lab! In this tutorial, you’ll learn how to install and useKubeapps, a web-based UI for deploying and managing applications on your Kubernetes cluster using Helm charts. - You’ll set up Kubeapps, configure authentication, browse application catalogs, deploy applications, and manage their lifecycle through the dashboard.
What will we learn?¶
- What
Kubeappsis and why it is useful - How to install
Kubeappsusing Helm - How to configure authentication and RBAC for
Kubeapps - How to browse, deploy, upgrade, and delete applications from the dashboard
- How to add custom Helm repositories and private registries
- How to manage application catalogs and configurations
- Troubleshooting and best practices
Official Documentation & References¶
| Resource | Link |
|---|---|
| Kubeapps Official Site | kubeapps.dev |
| Kubeapps GitHub Repository | github.com/vmware-tanzu/kubeapps |
| Kubeapps Documentation | kubeapps.dev/docs |
| Bitnami Kubeapps Helm Chart | artifacthub.io/packages/helm/bitnami/kubeapps |
| Helm Official Docs | helm.sh/docs |
| Kubernetes RBAC | kubernetes.io/docs/reference/access-authn-authz/rbac |
Prerequisites¶
- A running Kubernetes cluster (minikube, kind, Docker Desktop, or cloud-managed)
kubectlinstalled and configured to communicate with your clusterHelm(v3+) installed- A web browser to access the Kubeapps dashboard
- Basic understanding of Helm charts (see Lab 13 - HelmChart)
Introduction¶
What is Kubeapps?¶
Kubeappsis an in-cluster web-based application that enables users to deploy and manage applications on a Kubernetes cluster using Helm charts.- It provides a graphical user interface (GUI) for browsing, deploying, upgrading, and deleting Helm-based applications.
- Think of it as an “App Store” for your Kubernetes cluster.
Why use Kubeapps?¶
- Visual application catalog: Browse Helm charts from multiple repositories with a rich UI
- One-click deployments: Deploy complex applications without writing
helm installcommands - Self-service: Enable developers to deploy applications without deep Kubernetes knowledge
- Multi-repository support: Aggregate charts from multiple Helm repositories and OCI registries
- Upgrade management: View available upgrades and apply them through the UI
- Multi-cluster support: Manage applications across multiple Kubernetes clusters
- RBAC integration: Control who can deploy what, based on Kubernetes RBAC
Architecture¶
graph TB
subgraph "User"
A[Browser]
end
subgraph "Kubernetes Cluster"
subgraph "Kubeapps Namespace"
B[Kubeapps Frontend<br>nginx]
C[Kubeapps APIs<br>kubeapps-internal-kubeappsapis]
D[AppRepository Controller]
E[PostgreSQL / Redis<br>Asset Cache]
end
subgraph "Helm Repositories"
F[Bitnami]
G[Custom Repos]
H[OCI Registries]
end
subgraph "Target Namespace"
I[Deployed Application<br>Pods, Services, etc.]
end
end
A -->|HTTPS| B
B --> C
C --> D
D --> F
D --> G
D --> H
C -->|Helm SDK| I
D --> E
style A fill:#326CE5,color:#fff
style B fill:#326CE5,color:#fff
style I fill:#326CE5,color:#fff
Key Components¶
| Component | Description |
|---|---|
| Kubeapps Frontend | Nginx-based web UI that users interact with through the browser |
| Kubeapps APIs | Backend service that handles Helm operations, catalog browsing, and auth |
| AppRepository Controller | Syncs Helm chart metadata from configured repositories into the asset cache |
| Asset Cache (DB) | PostgreSQL or Redis instance storing chart metadata for fast catalog browsing |
Common Operations¶
Below is a reference of common operations you’ll perform with Kubeapps, both through the UI and via command line.
Helm CLI Reference for Kubeapps¶
helm install - Install Kubeapps
Syntax: helm install kubeapps bitnami/kubeapps [options]
Description: Deploys Kubeapps to your Kubernetes cluster.
- Installs all Kubeapps components (frontend, API server, controller, database)
- Creates the necessary RBAC resources
-
Configures the default chart repositories
# Basic install helm install kubeapps bitnami/kubeapps \ --namespace kubeapps --create-namespace # Install with custom values helm install kubeapps bitnami/kubeapps \ --namespace kubeapps --create-namespace \ -f custom-values.yaml # Install with specific chart version helm install kubeapps bitnami/kubeapps \ --namespace kubeapps --create-namespace \ --version 15.0.0
Create ServiceAccount and Token
Description: Create a ServiceAccount with cluster-admin permissions for Kubeapps authentication.
- Kubeapps uses Kubernetes tokens for authentication
- The ServiceAccount must have appropriate RBAC permissions
-
For production, use more restrictive roles than cluster-admin
# Create a namespace for the operator kubectl create namespace kubeapps-user # Create a ServiceAccount kubectl create serviceaccount kubeapps-operator \ -n kubeapps-user # Bind cluster-admin role kubectl create clusterrolebinding kubeapps-operator \ --clusterrole=cluster-admin \ --serviceaccount=kubeapps-user:kubeapps-operator # Create a token secret cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Secret metadata: name: kubeapps-operator-token namespace: kubeapps-user annotations: kubernetes.io/service-account.name: kubeapps-operator type: kubernetes.io/service-account-token EOF # Get the token kubectl get secret kubeapps-operator-token \ -n kubeapps-user \ -o go-template='{{.data.token | base64decode}}'
Access Kubeapps Dashboard
Description: Access the Kubeapps web UI through port-forwarding or Ingress.
- Port-forwarding is the simplest approach for development
-
For production, configure an Ingress with TLS
Add Custom Helm Repository
Description: Add a custom Helm chart repository to Kubeapps.
- Can be done through the UI or using kubectl
- Supports public and private repositories
-
Supports OCI-based registries
# Add a custom repository via kubectl cat <<EOF | kubectl apply -f - apiVersion: kubeapps.com/v1alpha1 kind: AppRepository metadata: name: my-custom-repo namespace: kubeapps spec: url: https://charts.example.com # For private repos, add auth: # auth: # header: # secretKeyRef: # name: my-repo-auth # key: authorizationHeader EOF
RBAC Configuration¶
Kubeapps leverages Kubernetes RBAC to control access. Different roles provide different levels of access.
Role Examples¶
Read-Only User (View Only)¶
# kubeapps-viewer-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeapps-viewer
rules:
# Allow listing Helm releases
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
# Allow viewing workloads
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets", "daemonsets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods", "services", "configmaps"]
verbs: ["get", "list", "watch"]
Namespace Deployer (Deploy to Specific Namespace)¶
# kubeapps-deployer-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kubeapps-deployer
namespace: my-team
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
Cluster Admin (Full Access)¶
# For lab/development purposes, bind to cluster-admin
kubectl create clusterrolebinding kubeapps-admin \
--clusterrole=cluster-admin \
--serviceaccount=kubeapps-user:kubeapps-operator
Production RBAC
Never use cluster-admin in production environments. Create specific roles that grant only the minimum permissions needed for each user or team.
Lab¶
Step 01 - Install Kubeapps¶
Add the Bitnami Helm repository¶
# Add the Bitnami repository
helm repo add bitnami https://charts.bitnami.com/bitnami
# Update the repository index
helm repo update
Install Kubeapps¶
# Install Kubeapps in its own namespace
helm install kubeapps bitnami/kubeapps \
--namespace kubeapps \
--create-namespace \
--wait
Verify the installation¶
# Check all pods are running
kubectl get pods -n kubeapps
# Expected output (pod names will vary):
# NAME READY STATUS RESTARTS AGE
# kubeapps-... 1/1 Running 0 2m
# kubeapps-internal-kubeappsapis-... 1/1 Running 0 2m
# kubeapps-internal-apprepository-controller-... 1/1 Running 0 2m
# kubeapps-postgresql-0 1/1 Running 0 2m
# Check all services
kubectl get svc -n kubeapps
Step 02 - Create Authentication Credentials¶
Kubeappsuses Kubernetes tokens for authentication. We need to create a ServiceAccount and generate a token:
# Create a ServiceAccount for the Kubeapps operator
kubectl create serviceaccount kubeapps-operator \
-n kubeapps-user
# Bind the cluster-admin role to the service account
kubectl create clusterrolebinding kubeapps-operator \
--clusterrole=cluster-admin \
--serviceaccount=kubeapps-user:kubeapps-operator
# Create a Secret to generate the token
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: kubeapps-operator-token
namespace: kubeapps-user
annotations:
kubernetes.io/service-account.name: kubeapps-operator
type: kubernetes.io/service-account-token
EOF
Retrieve the token¶
# Get the authentication token (save this - you'll need it to log in)
kubectl get secret kubeapps-operator-token \
-n kubeapps-user \
-o go-template='{{.data.token | base64decode}}'
Save the token
Copy the token output and save it somewhere accessible. You’ll paste it into the Kubeapps login page.
Step 03 - Access the Dashboard¶
Option A: Port Forwarding (Development)¶
- Open your browser and navigate to: http://localhost:8080
- Paste the token from Step 02 into the login field
- Click Submit
Option B: Ingress (Production-like)¶
# kubeapps-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kubeapps-ingress
namespace: kubeapps
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: kubeapps.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kubeapps
port:
number: 80
# Apply the ingress
kubectl apply -f kubeapps-ingress.yaml
# Add to /etc/hosts (Linux/macOS)
echo "127.0.0.1 kubeapps.local" | sudo tee -a /etc/hosts
# Access via: http://kubeapps.local
Step 04 - Browse the Application Catalog¶
Once logged in, you’ll see the Kubeapps dashboard:
- Catalog tab: Browse available Helm charts from configured repositories
- Applications tab: View deployed Helm releases
- Configuration tab: Manage app repositories and settings
Browse available charts¶
- Click on the Catalog tab
- Use the search bar to find applications (e.g., “nginx”, “redis”, “postgresql”)
- Click on a chart card to see details, including:
- Chart description and README
- Available versions
- Default configuration values
- Installation instructions
Step 05 - Deploy an Application via the UI¶
Let’s deploy NGINX using the Kubeapps dashboard:
- Click Catalog in the navigation
- Search for nginx
- Select bitnami/nginx from the results
- Click Deploy
- Configure the deployment:
- Name:
my-nginx - Namespace:
default(or create a new one) - Review the default values in the YAML editor
- Modify
replicaCountto2if desired - Click Deploy to install
Verify from the command line¶
# Check the Helm release
helm list --all-namespaces
# Check the deployed resources
kubectl get all -l app.kubernetes.io/instance=my-nginx
# Check the pods
kubectl get pods -l app.kubernetes.io/instance=my-nginx
Step 06 - Deploy an Application via CLI¶
You can also deploy applications that will appear in the Kubeapps dashboard using the Helm CLI:
# Deploy Redis via Helm (Kubeapps will detect it automatically)
helm install my-redis bitnami/redis \
--namespace default \
--set architecture=standalone \
--set auth.enabled=false
- Go back to the Kubeapps dashboard
- Click on the Applications tab
- You should see both
my-nginxandmy-redislisted
Step 07 - Upgrade an Application¶
Via the Dashboard¶
- Click on Applications tab
- Click on my-nginx
- Click Upgrade
- Modify the values (e.g., change
replicaCountto3) - Click Deploy to apply the upgrade
Via CLI (also reflected in the dashboard)¶
# Upgrade Redis to enable auth
helm upgrade my-redis bitnami/redis \
--set architecture=standalone \
--set auth.enabled=true \
--set auth.password=mypassword
# Check the upgrade in the dashboard
# The revision number should increment
helm history my-redis
Step 08 - Add a Custom Repository¶
Via the Dashboard¶
- Click on Configuration in the navigation (gear icon)
- Click App Repositories
- Click Add App Repository
- Fill in:
- Name:
codecentric - URL:
https://codecentric.github.io/helm-charts - Click Install Repository
Via kubectl¶
# Add a custom repository via kubectl manifest
cat <<EOF | kubectl apply -n kubeapps -f -
apiVersion: kubeapps.com/v1alpha1
kind: AppRepository
metadata:
name: codecentric
namespace: kubeapps
spec:
url: https://codecentric.github.io/helm-charts
EOF
- Go to the Catalog tab and you should now see charts from the new repository.
Step 09 - View Application Details¶
- In the Kubeapps dashboard, click on any deployed application to see:
- Status: The current state of the Helm release
- Resources: All Kubernetes resources created by the chart (Pods, Services, ConfigMaps, etc.)
- Notes: Post-install notes from the chart
- Values: The configuration values used for the deployment
- Revision History: All previous versions and their configurations
Compare with CLI output¶
# View the same information from the command line
helm status my-nginx
helm get values my-nginx
helm history my-nginx
kubectl get all -l app.kubernetes.io/instance=my-nginx
Step 10 - Delete an Application¶
Via the Dashboard¶
- Click on Applications tab
- Click on the application you want to delete (e.g., my-nginx)
- Click Delete
- Confirm the deletion
Via CLI¶
Step 11 - Create a Restricted User¶
Create a ServiceAccount with limited permissions (namespace-scoped only):
# Create a namespace for the restricted user
kubectl create namespace team-dev
# Create a ServiceAccount
kubectl create serviceaccount kubeapps-dev-user -n team-dev
# Create a Role with limited permissions
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kubeapps-dev-role
namespace: team-dev
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
EOF
# Bind the role to the ServiceAccount
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubeapps-dev-binding
namespace: team-dev
subjects:
- kind: ServiceAccount
name: kubeapps-dev-user
namespace: team-dev
roleRef:
kind: Role
name: kubeapps-dev-role
apiGroup: rbac.authorization.k8s.io
EOF
# Create a token for the dev user
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: kubeapps-dev-token
namespace: team-dev
annotations:
kubernetes.io/service-account.name: kubeapps-dev-user
type: kubernetes.io/service-account-token
EOF
# Get the token
kubectl get secret kubeapps-dev-token \
-n team-dev \
-o go-template='{{.data.token | base64decode}}'
- Log out of Kubeapps and log in with this new token
- You should only be able to deploy applications to the
team-devnamespace
Exercises¶
The following exercises will test your understanding of Kubeapps.
Try to solve each exercise on your own before revealing the solution.
01. Deploy WordPress with Custom Values¶
Deploy a WordPress instance using the Kubeapps dashboard with custom database credentials and a specific number of replicas.
Scenario:¶
- Your team needs a quick WordPress deployment for a staging environment.
- You need to customize the database password and blog name before deploying.
Hint: Search for “wordpress” in the catalog, modify the wordpressPassword, wordpressBlogName, and replicaCount values.
Solution
# Option 1: Via the Kubeapps UI
# 1. Go to Catalog > search "wordpress"
# 2. Select bitnami/wordpress
# 3. Click Deploy
# 4. In the values editor, set:
# - wordpressUsername: admin
# - wordpressPassword: my-staging-password
# - wordpressBlogName: "Staging Blog"
# - replicaCount: 2
# 5. Click Deploy
# Option 2: Via CLI (reflected in the dashboard)
helm install my-wordpress bitnami/wordpress \
--namespace default \
--set wordpressUsername=admin \
--set wordpressPassword=my-staging-password \
--set wordpressBlogName="Staging Blog" \
--set replicaCount=2
# Verify the deployment
kubectl get pods -l app.kubernetes.io/instance=my-wordpress
kubectl get svc -l app.kubernetes.io/instance=my-wordpress
# Access WordPress via port-forward
kubectl port-forward svc/my-wordpress 8081:80
# Open http://localhost:8081 in your browser
# Clean up
helm uninstall my-wordpress
02. Add a Private Helm Repository¶
Add a private Helm repository to Kubeapps that requires authentication. Use basic auth credentials.
Scenario:¶
- Your organization hosts internal Helm charts in a private repository.
- The repository requires basic authentication (username/password).
Hint: Create a Kubernetes Secret with auth credentials first, then reference it in the AppRepository.
Solution
# 1. Create a secret with the repository credentials
kubectl create secret generic my-private-repo-auth \
-n kubeapps \
--from-literal=authorizationHeader="Basic $(echo -n 'myuser:mypassword' | base64)"
# 2. Create the AppRepository with auth reference
cat <<EOF | kubectl apply -n kubeapps -f -
apiVersion: kubeapps.com/v1alpha1
kind: AppRepository
metadata:
name: my-private-repo
namespace: kubeapps
spec:
url: https://charts.example.com
auth:
header:
secretKeyRef:
name: my-private-repo-auth
key: authorizationHeader
EOF
# 3. Verify the repository was added
kubectl get apprepositories -n kubeapps
# 4. Check the sync status
kubectl get pods -n kubeapps -l app=apprepo-sync-my-private-repo
# Clean up
kubectl delete apprepository my-private-repo -n kubeapps
kubectl delete secret my-private-repo-auth -n kubeapps
03. Upgrade an Application and Rollback¶
Deploy PostgreSQL, upgrade it with new configuration, then rollback to the previous version.
Scenario:¶
- You deployed PostgreSQL with default settings.
- After upgrading with new memory limits, the pods fail to start.
- You need to rollback to the working version.
Hint: Use the Kubeapps UI or helm rollback to revert to a previous revision.
Solution
# 1. Deploy PostgreSQL
helm install my-postgres bitnami/postgresql \
--namespace default \
--set auth.postgresPassword=initial-password
# Verify it's running
kubectl get pods -l app.kubernetes.io/instance=my-postgres
# 2. Upgrade with new configuration
helm upgrade my-postgres bitnami/postgresql \
--set auth.postgresPassword=initial-password \
--set primary.resources.requests.memory=2Gi \
--set primary.resources.limits.memory=4Gi
# 3. Check the history
helm history my-postgres
# 4. Rollback to the previous revision (via CLI)
helm rollback my-postgres 1
# OR via the Kubeapps dashboard:
# - Go to Applications > my-postgres
# - Click on the revision dropdown
# - Select revision 1
# - Click Rollback
# 5. Verify the rollback
helm history my-postgres
kubectl get pods -l app.kubernetes.io/instance=my-postgres
# Clean up
helm uninstall my-postgres
04. Multi-Namespace Deployment¶
Create two separate ServiceAccounts with permissions for different namespaces, deploy applications in each namespace using the appropriate token.
Scenario:¶
- Your organization has two teams:
team-frontendandteam-backend. - Each team should only be able to deploy to their own namespace.
- Test that RBAC prevents cross-namespace deployments.
Hint: Create two namespaces, two ServiceAccounts, and namespace-scoped RoleBindings.
Solution
# 1. Create namespaces
kubectl create namespace team-frontend
kubectl create namespace team-backend
# 2. Create ServiceAccounts
kubectl create serviceaccount frontend-deployer -n team-frontend
kubectl create serviceaccount backend-deployer -n team-backend
# 3. Create namespace-scoped Roles and RoleBindings
for TEAM in frontend backend; do
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deployer-role
namespace: team-${TEAM}
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: deployer-binding
namespace: team-${TEAM}
subjects:
- kind: ServiceAccount
name: ${TEAM}-deployer
namespace: team-${TEAM}
roleRef:
kind: Role
name: deployer-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Secret
metadata:
name: ${TEAM}-deployer-token
namespace: team-${TEAM}
annotations:
kubernetes.io/service-account.name: ${TEAM}-deployer
type: kubernetes.io/service-account-token
EOF
done
# 4. Get tokens for each team
echo "=== Frontend Token ==="
kubectl get secret frontend-deployer-token \
-n team-frontend \
-o go-template='{{.data.token | base64decode}}'
echo ""
echo "=== Backend Token ==="
kubectl get secret backend-deployer-token \
-n team-backend \
-o go-template='{{.data.token | base64decode}}'
# 5. Log into Kubeapps with the frontend token
# - You should only see the team-frontend namespace
# - Try deploying nginx to team-frontend (should succeed)
# - Try deploying to team-backend (should fail)
# Clean up
kubectl delete namespace team-frontend team-backend
05. Configure Kubeapps with Custom Values¶
Reinstall Kubeapps with custom configuration: enable Ingress, change the number of frontend replicas, and configure a specific set of default repositories.
Scenario:¶
- You’re setting up Kubeapps for a production environment.
- You need to configure it with Ingress, high availability, and specific repositories.
Hint: Create a custom values.yaml and pass it to helm upgrade --install.
Solution
# 1. Show the default values
helm show values bitnami/kubeapps > kubeapps-default-values.yaml
# 2. Create a custom values file
cat <<EOF > kubeapps-custom-values.yaml
# Frontend configuration
frontend:
replicaCount: 2
# Ingress configuration
ingress:
enabled: true
hostname: kubeapps.local
ingressClassName: nginx
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
# Default app repositories
apprepository:
initialRepos:
- name: bitnami
url: https://charts.bitnami.com/bitnami
- name: ingress-nginx
url: https://kubernetes.github.io/ingress-nginx
- name: jetstack
url: https://charts.jetstack.io
- name: prometheus-community
url: https://prometheus-community.github.io/helm-charts
EOF
# 3. Upgrade Kubeapps with custom values
helm upgrade --install kubeapps bitnami/kubeapps \
--namespace kubeapps \
--create-namespace \
-f kubeapps-custom-values.yaml \
--wait
# 4. Verify the changes
kubectl get pods -n kubeapps
kubectl get ingress -n kubeapps
kubectl get apprepositories -n kubeapps
# Clean up
rm kubeapps-default-values.yaml kubeapps-custom-values.yaml
06. Monitor Application Health¶
Deploy an application through Kubeapps and use the dashboard to monitor its health. Intentionally break it and observe the status changes.
Scenario:¶
- You’ve deployed an application and need to ensure it stays healthy.
- You want to understand how Kubeapps reports application health status.
Hint: Deploy nginx, then force-delete a pod or scale down to 0 and observe the dashboard.
Solution
# 1. Deploy nginx via Kubeapps or CLI
helm install health-test bitnami/nginx \
--namespace default \
--set replicaCount=3
# 2. Check the healthy state in Kubeapps dashboard
# - Go to Applications > health-test
# - All pods should show as Running
# - Status should be "Deployed"
# 3. Break the application - delete a pod
kubectl delete pod -l app.kubernetes.io/instance=health-test --wait=false
# 4. Observe in the dashboard:
# - One pod will briefly show as Terminating
# - A new pod will appear as ContainerCreating
# - Eventually all pods return to Running
# 5. Scale down to 0 replicas
kubectl scale deployment health-test-nginx --replicas=0
# 6. Observe in the dashboard:
# - No pods running
# - Application shows degraded state
# 7. Scale back up
kubectl scale deployment health-test-nginx --replicas=3
# 8. Verify recovery in the dashboard
# Clean up
helm uninstall health-test
Finalize & Cleanup¶
- To remove all resources created by this lab:
# Remove deployed applications
helm uninstall my-nginx 2>/dev/null
helm uninstall my-redis 2>/dev/null
helm uninstall my-wordpress 2>/dev/null
helm uninstall my-postgres 2>/dev/null
helm uninstall health-test 2>/dev/null
# Remove Kubeapps
helm uninstall kubeapps -n kubeapps
# Remove namespaces
kubectl delete namespace kubeapps kubeapps-user team-dev 2>/dev/null
# Remove ClusterRoleBinding
kubectl delete clusterrolebinding kubeapps-operator 2>/dev/null
Troubleshooting¶
- Kubeapps pods not starting:
Check pod status and events:
kubectl get pods -n kubeapps
kubectl describe pod <pod-name> -n kubeapps
kubectl logs <pod-name> -n kubeapps
- Cannot log in to Kubeapps:
Ensure the ServiceAccount token is valid and the ClusterRoleBinding exists:
# Verify the token secret exists
kubectl get secret kubeapps-operator-token -n kubeapps-user
# Verify the ClusterRoleBinding exists
kubectl get clusterrolebinding kubeapps-operator
# Regenerate the token if needed
kubectl delete secret kubeapps-operator-token -n kubeapps-user
# Then recreate it (see Step 02)
- Catalog shows no charts:
The AppRepository controller may need time to sync. Check its logs:
# Check sync pods
kubectl get pods -n kubeapps -l app=apprepo
# Check controller logs
kubectl logs -n kubeapps -l app.kubernetes.io/component=apprepository-controller
# Verify AppRepositories exist
kubectl get apprepositories -n kubeapps
- Port-forward not working:
Ensure the service is running and no other process uses port 8080:
# Check the service exists
kubectl get svc kubeapps -n kubeapps
# Try a different local port
kubectl port-forward svc/kubeapps -n kubeapps 9090:80
- Application deployment fails from the dashboard:
Check the Kubeapps API server logs for details:
- RBAC errors (403 Forbidden):
The token’s ServiceAccount lacks necessary permissions. Check and update the RoleBinding:
# Check current bindings
kubectl get clusterrolebindings | grep kubeapps
kubectl get rolebindings -n <target-namespace> | grep kubeapps
# Verify the ServiceAccount exists
kubectl get serviceaccount -n kubeapps-user
Next Steps¶
- Learn about Kubeapps multi-cluster support for managing apps across clusters
- Explore OCI registry support for Helm charts stored in container registries
- Integrate Kubeapps with OIDC providers (Dex, Keycloak) for SSO authentication
- Set up Kubeapps with Operators to manage operator-based applications
- Explore Carvel packages as an alternative packaging format
- Configure private chart repositories with Docker registry integration
Advanced
Custom Resource Definitions (CRD)¶
Custom Resource Definitions(CRD) were added to Kubernetes 1.7.CRDadded the ability to define custom objects/resources.- In this lab we will learn how CRDs extend the Kubernetes API.
What will we learn?¶
- What a Custom Resource Definition (CRD) is
- How CRDs extend the Kubernetes API
- How custom resources are stored and managed
- How to interact with custom resources using
kubectl
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster
Introduction¶
What is a Custom Resource Definition (CRD)?¶
-
A resource is an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind; for example, the builtin pods resource contains a collection of Pod objects.
-
A custom resource is an extension of the Kubernetes API that is not necessarily available in a default Kubernetes installation. It represents a customization of a particular Kubernetes installation. However, many core Kubernetes functions are now built using custom resources, making Kubernetes more modular.
-
Custom resources can appear and disappear in a running cluster through dynamic registration, and cluster admins can update custom resources independently of the cluster itself.
-
Once a custom resource is installed, users can create and access its objects using
kubectl, just as they do for built-in resources like Pods. -
The custom resource created is also stored in the
etcdcluster with proper replication and lifecycle management.
Node Affinity, Pod Affinity, Anti-Affinity, Taints & Tolerations¶
- Kubernetes provides a rich set of mechanisms to control where Pods are scheduled in your cluster.
- In this lab we will deep-dive into every scheduling constraint available:
nodeSelector,Node Affinity,Pod Affinity,Pod Anti-Affinity,Taints,Tolerations, andTopology Spread Constraints. - We will build real-world examples - from GPU node pools to zone-aware deployments and multi-tenant cluster isolation.
- By the end of this lab you will have mastered fine-grained Pod placement and be able to design sophisticated scheduling strategies for production clusters.
What will we learn?¶
- Why Pod scheduling constraints exist and when to use each mechanism
nodeSelector- simple, label-based node filteringNode Affinity- expressive node selection with operators, required vs. preferred rulesPod Affinity- co-locate Pods with other Pods (same node, same zone)Pod Anti-Affinity- spread Pods away from each otherTaints- repel Pods from NodesTolerations- allow Pods onto tainted NodesTopology Spread Constraints- balance Pods evenly across topology domains- Real-world patterns: GPU pools, zone spreading, multi-tenant isolation, database co-location
- How to combine all mechanisms for complex production requirements
- Scheduling internals and the decision pipeline
Official Documentation & References¶
| Resource | Link |
|---|---|
| Assign Pods to Nodes | kubernetes.io/docs |
| Node Affinity | kubernetes.io/docs/affinity |
| Taints and Tolerations | kubernetes.io/docs/taints |
| Topology Spread Constraints | kubernetes.io/docs/topology |
| kube-scheduler | kubernetes.io/docs/kube-scheduler |
| Pod Priority & Preemption | kubernetes.io/docs/priority |
| Well-Known Node Labels | kubernetes.io/docs/reference/labels |
Introduction¶
The Kubernetes Scheduling Pipeline¶
When you create a Pod, the kube-scheduler selects which Node it runs on. The scheduler runs through several phases:
flowchart TD
A["New Pod created\n(no NodeName)"] --> B["Filtering Phase\n(Predicates)"]
B --> C{"Any nodes\npassed filter?"}
C -- No --> D["Pod stays Pending\nEvent: FailedScheduling"]
C -- Yes --> E["Scoring Phase\n(Priorities)"]
E --> F["Highest-score Node\nselected"]
F --> G["Pod bound to Node\nkubelet starts Pod"]
subgraph "Filtering checks include:"
B1["nodeSelector / nodeName"]
B2["Node Affinity (required)"]
B3["Pod Affinity / Anti-Affinity (required)"]
B4["Taints / Tolerations"]
B5["Resource availability (CPU, Mem)"]
B6["Topology Spread Constraints"]
end
subgraph "Scoring factors include:"
E1["Node Affinity (preferred) weight"]
E2["Pod Affinity (preferred) weight"]
E3["Least requested resources"]
E4["Image locality"]
end
Scheduling Mechanism Overview¶
| Mechanism | Direction | Hardness | Scope |
|---|---|---|---|
nodeName |
Pod β Node | Hard | Single node |
nodeSelector |
Pod β Node | Hard | Label match |
| Node Affinity (required) | Pod β Node | Hard | Operators, multi-label |
| Node Affinity (preferred) | Pod β Node | Soft | With weights |
| Pod Affinity (required) | Pod β Pod | Hard | Topology domain |
| Pod Affinity (preferred) | Pod β Pod | Soft | With weights |
| Pod Anti-Affinity (required) | Pod β Pod | Hard | Topology domain |
| Pod Anti-Affinity (preferred) | Pod β Pod | Soft | With weights |
Taint NoSchedule |
Node repels Pod | Hard | New pods excluded |
Taint NoExecute |
Node repels Pod | Hard | Existing pods evicted |
Taint PreferNoSchedule |
Node repels Pod | Soft | Avoid if possible |
| Topology Spread | Pod distribution | Hard/Soft | Arbitrary topology |
Terminology¶
| Term | Description |
|---|---|
| Node | A physical or virtual machine in the Kubernetes cluster |
| Node Label | A key-value pair attached to a Node used for selection |
| Taint | A key-value-effect triple on a Node that repels Pods |
| Toleration | A key-value-effect triple on a Pod that permits scheduling on a tainted Node |
| Affinity | A set of rules the scheduler uses to prefer or require specific placement |
| Anti-Affinity | Rules to keep Pods away from specific locations or other Pods |
| topologyKey | A node label key that defines the topology domain (e.g., kubernetes.io/hostname, topology.kubernetes.io/zone) |
| Required (hard) | requiredDuringSchedulingIgnoredDuringExecution - the Pod won’t schedule if the rule can’t be satisfied |
| Preferred (soft) | preferredDuringSchedulingIgnoredDuringExecution - the scheduler tries to honor but will schedule anyway |
| IgnoredDuringExecution | Already-running Pods are NOT evicted if rules change after scheduling |
| weight | Integer 1β100 given to a preferred rule; used in scoring |
| Topology Spread Constraint | Rule that limits how unevenly Pods can be distributed across topology domains |
| maxSkew | Maximum difference in Pod count between the most and least loaded topology domain |
| whenUnsatisfiable | What to do when spread can’t be satisfied: DoNotSchedule (hard) or ScheduleAnyway (soft) |
Common kubectl Commands¶
kubectl label - Add, update, remove labels on nodes
Syntax: kubectl label nodes <node-name> <key>=<value>
Description: Labels are key-value pairs attached to Nodes (and any Kubernetes object). They are the foundation of all affinity rules and nodeSelector.
# List all nodes
kubectl get nodes
# Show all labels on nodes
kubectl get nodes --show-labels
# Label a node
kubectl label nodes node-1 environment=production
# Label with multiple keys at once
kubectl label nodes node-1 environment=production tier=frontend
# Overwrite an existing label (requires --overwrite)
kubectl label nodes node-1 environment=staging --overwrite
# Remove a label (append a minus sign)
kubectl label nodes node-1 environment-
# Label all nodes matching a selector
kubectl label nodes -l kubernetes.io/role=worker disk-type=ssd
# Show node labels formatted as a table
kubectl get nodes -o custom-columns=NAME:.metadata.name,LABELS:.metadata.labels
kubectl taint - Add and remove taints on nodes
Syntax: kubectl taint nodes <node-name> <key>=<value>:<effect>
Description: Taints prevent Pods from being scheduled on a Node unless they have a matching toleration.
# Add a taint with NoSchedule effect
kubectl taint nodes node-1 dedicated=gpu:NoSchedule
# Add a taint with NoExecute effect (evicts running pods)
kubectl taint nodes node-1 maintenance=true:NoExecute
# Add a taint with PreferNoSchedule effect (soft)
kubectl taint nodes node-1 spot-instance=true:PreferNoSchedule
# Remove a taint (append a minus sign after the effect)
kubectl taint nodes node-1 dedicated=gpu:NoSchedule-
# Remove all taints with a given key regardless of value/effect
kubectl taint nodes node-1 dedicated-
# Show all taints on all nodes
kubectl describe nodes | grep -A3 "Taints:"
# Taint ALL nodes in the cluster
kubectl taint nodes --all dedicated=shared:PreferNoSchedule
kubectl describe node - Inspect node labels, taints, and allocatable resources
# Full node description
kubectl describe node node-1
# Show just labels
kubectl get node node-1 -o jsonpath='{.metadata.labels}' | jq
# Show just taints
kubectl get node node-1 -o jsonpath='{.spec.taints}'
# Show topology labels (zone / region)
kubectl get nodes -o custom-columns=\
NAME:.metadata.name,\
REGION:.metadata.labels."topology\.kubernetes\.io/region",\
ZONE:.metadata.labels."topology\.kubernetes\.io/zone"
# Check which node a pod landed on
kubectl get pods -o wide
# Watch pod scheduling events
kubectl get events --sort-by='.lastTimestamp' | grep FailedScheduling
kubectl get - Filter pods and nodes by label
# List pods on a specific node
kubectl get pods --field-selector spec.nodeName=node-1
# List pods with nodeSelector labels
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeSelector}{"\n"}{end}'
# List all nodes with a specific label
kubectl get nodes -l environment=production
# List nodes with topology zone labels
kubectl get nodes -l topology.kubernetes.io/zone=us-east-1a
Part 01 - nodeSelector (Simple Node Selection)¶
nodeSelectoris the simplest way to constrain a Pod to specific Nodes.- It is a map of label key-value pairs - the Pod will only be scheduled on Nodes that have all the specified labels.
- It is less expressive than Node Affinity but easier to read for simple cases.
Step 01.01 - Label a Node¶
# Label a node for SSD storage
kubectl label nodes node-1 disk-type=ssd
# Label another node for HDD storage
kubectl label nodes node-2 disk-type=hdd
# Verify
kubectl get nodes --show-labels | grep disk-type
Step 01.02 - Schedule a Pod with nodeSelector¶
# pod-nodeselector.yaml
apiVersion: v1
kind: Pod
metadata:
name: ssd-pod
spec:
nodeSelector:
disk-type: ssd # Must match this exact label
containers:
- name: app
image: nginx:1.25
kubectl apply -f pod-nodeselector.yaml
kubectl get pod ssd-pod -o wide # Verify it landed on a ssd node
Step 01.03 - nodeSelector vs Node Affinity¶
| Feature | nodeSelector |
Node Affinity |
|---|---|---|
| Operators | = only |
In, NotIn, Exists, DoesNotExist, Gt, Lt |
| Logic | AND (all labels must match) | AND within a term, OR between terms |
| Soft preferences | No | Yes (preferred) |
| Multiple label conditions | Yes (all must match) | Yes (with full OR/AND logic) |
- Prefer
Node Affinityfor all new workloads. UsenodeSelectoronly for simple backward-compatible cases.
Part 02 - Node Affinity¶
Node Affinityis the next-generationnodeSelector.- It lets you express complex label requirements using operators and supports both hard and soft rules.
- All Node Affinity rules live under
spec.affinity.nodeAffinity.
Node Affinity Rule Types¶
| Rule | Description |
|---|---|
requiredDuringSchedulingIgnoredDuringExecution |
Hard - Pod won’t schedule if rule can’t be met |
preferredDuringSchedulingIgnoredDuringExecution |
Soft - scheduler tries to honor, but schedules anyway |
IgnoredDuringExecution
Both rule types have IgnoredDuringExecution. This means if a Node’s labels change after a Pod is scheduled, the Pod is not evicted. A future type requiredDuringSchedulingRequiredDuringExecution is planned but not yet stable.
Node Affinity Operators¶
| Operator | Description | Example |
|---|---|---|
In |
Label value is in the set | environment In [production, staging] |
NotIn |
Label value is NOT in the set | environment NotIn [development] |
Exists |
Label key exists (any value) | gpu Exists |
DoesNotExist |
Label key does not exist | spot-node DoesNotExist |
Gt |
Label value numerically > specified | storage-gb Gt 100 |
Lt |
Label value numerically < specified | storage-gb Lt 1000 |
Step 02.01 - Required Node Affinity (Hard Rule)¶
# Setup - label nodes
kubectl label nodes node-1 environment=production zone=us-east
kubectl label nodes node-2 environment=development zone=us-west
# pod-required-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: prod-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: environment
operator: In
values:
- production
- staging # Pod can go to production OR staging nodes
containers:
- name: app
image: nginx:1.25
kubectl apply -f pod-required-affinity.yaml
kubectl get pod prod-pod -o wide
kubectl describe pod prod-pod | grep -E "Node:|Affinity"
Pod stays Pending if no matching Nodes exist
If no Node has environment=production or environment=staging, the Pod will remain in Pending state with event FailedScheduling: 0/N nodes are available: N node(s) didn't match Pod's node affinity/selector.
Step 02.02 - Preferred Node Affinity (Soft Rule with Weight)¶
# pod-preferred-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: preferred-pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80 # Strong preference (out of 100)
preference:
matchExpressions:
- key: environment
operator: In
values:
- production
- weight: 20 # Weak preference
preference:
matchExpressions:
- key: zone
operator: In
values:
- us-east
containers:
- name: app
image: nginx:1.25
- The scheduler adds the
weightvalues as bonus score for each Node that matches that preference. - A Pod CAN be scheduled on a Node that matches neither preference - the rules are purely advisory.
Step 02.03 - Combining Required and Preferred Rules¶
# pod-combined-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: combined-affinity-pod
spec:
affinity:
nodeAffinity:
# HARD: MUST be production or staging
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: environment
operator: In
values:
- production
- staging
# SOFT: prefer us-east zone within those nodes
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: zone
operator: In
values:
- us-east
containers:
- name: app
image: nginx:1.25
Step 02.04 - Understanding OR and AND Logic¶
nodeSelectorTerms - OR logic; matchExpressions - AND logic
- Multiple entries in
nodeSelectorTermsare combined with OR - the Pod can match ANY of them. - Multiple entries in a single
matchExpressionslist are combined with AND - ALL must be satisfied.
# OR logic example: pod can go to either SSD nodes OR GPU nodes
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: # First term (SSD)
- key: disk-type
operator: In
values: [ssd]
- matchExpressions: # Second term (GPU) - OR with first
- key: hardware
operator: In
values: [gpu]
# AND logic example: pod must be on SSD AND production nodes
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: # Both conditions in the same term = AND
- key: disk-type
operator: In
values: [ssd]
- key: environment
operator: In
values: [production]
Step 02.05 - NotIn Operator (Exclude Nodes)¶
# Avoid scheduling on spot/preemptible instances for critical workloads
apiVersion: v1
kind: Pod
metadata:
name: critical-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-lifecycle
operator: NotIn
values:
- spot
- preemptible
containers:
- name: app
image: nginx:1.25
Step 02.06 - Exists and DoesNotExist Operators¶
# Must run on a node that has ANY value for the 'gpu' label
apiVersion: v1
kind: Pod
metadata:
name: any-gpu-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu
operator: Exists # Any value for 'gpu' label is acceptable
containers:
- name: app
image: nvidia/cuda:12.0-base
# Must run on a node that has NO 'special-hardware' label at all
apiVersion: v1
kind: Pod
metadata:
name: standard-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: special-hardware
operator: DoesNotExist
containers:
- name: app
image: nginx:1.25
Step 02.07 - Gt and Lt Operators (Numeric Comparison)¶
# Label nodes with numeric values
kubectl label nodes node-1 memory-gb=16
kubectl label nodes node-2 memory-gb=64
kubectl label nodes node-3 memory-gb=128
# Schedule only on nodes with memory-gb > 32
apiVersion: v1
kind: Pod
metadata:
name: high-memory-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: memory-gb
operator: Gt
values:
- "32" # Value must be a string, comparison is numeric
containers:
- name: app
image: nginx:1.25
# Schedule on nodes with memory-gb between 32 and 256 (exclusive)
apiVersion: v1
kind: Pod
metadata:
name: medium-memory-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: memory-gb
operator: Gt
values: ["32"]
- key: memory-gb
operator: Lt
values: ["256"]
containers:
- name: app
image: nginx:1.25
Step 02.08 - Node Affinity in a Deployment¶
# deployment-node-affinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: environment
operator: In
values: [production]
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
preference:
matchExpressions:
- key: zone
operator: In
values: [us-east-1a]
- weight: 50
preference:
matchExpressions:
- key: zone
operator: In
values: [us-east-1b]
containers:
- name: web
image: nginx:1.25
ports:
- containerPort: 80
Part 03 - Pod Affinity (Co-locate Pods)¶
Pod Affinityallows you to influence the scheduler to place Pods near other Pods.- This is useful when Pods benefit from being on the same Node or in the same topology domain (e.g., same availability zone) to reduce network latency.
- Pod Affinity rules live under
spec.affinity.podAffinity.
How topologyKey Works¶
flowchart LR
subgraph "Zone us-east-1a"
subgraph "node-1"
P1["frontend-abc\napp=frontend"]
P2["cache-xyz (new)"]
end
subgraph "node-2"
P3["frontend-def\napp=frontend"]
P4["cache-uvw (new)"]
end
end
subgraph "Zone us-east-1b"
subgraph "node-3"
P5["backend-pod"]
end
end
note["topologyKey: kubernetes.io/hostname\nβ Co-locate on same NODE as matching Pod\n\ntopologyKey: topology.kubernetes.io/zone\nβ Co-locate in same ZONE as matching Pod"]
Common topologyKey values
kubernetes.io/hostname- same physical/virtual Nodetopology.kubernetes.io/zone- same availability zonetopology.kubernetes.io/region- same cloud region- Any custom label key on your Nodes
Step 03.01 - Required Pod Affinity (Co-locate on Same Node)¶
# two-pods-same-node.yaml
# First: deploy the "anchor" pod that others want to co-locate with
apiVersion: v1
kind: Pod
metadata:
name: cache-pod
labels:
app: cache
tier: caching
spec:
containers:
- name: redis
image: redis:7
ports:
- containerPort: 6379
---
# Second: deploy the app that MUST be on the same Node as a cache pod
apiVersion: v1
kind: Pod
metadata:
name: app-pod
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cache
topologyKey: kubernetes.io/hostname # Same node as a cache pod
containers:
- name: app
image: nginx:1.25
Step 03.02 - Required Pod Affinity (Co-locate in Same Zone)¶
# frontend-with-zone-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: frontend-pod
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: backend # Must be in same zone as a backend pod
topologyKey: topology.kubernetes.io/zone
containers:
- name: frontend
image: nginx:1.25
Step 03.03 - Preferred Pod Affinity¶
# pod-preferred-pod-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: app-prefer-near-cache
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cache
topologyKey: kubernetes.io/hostname # Prefer same node, but not required
containers:
- name: app
image: nginx:1.25
Step 03.04 - Pod Affinity in a Deployment (Sidecar Co-location Pattern)¶
# sidecar-affinity-deployment.yaml
# This deployment ensures each app pod is on the same node as a log-agent pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-sidecar-affinity
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: log-agent
topologyKey: kubernetes.io/hostname
containers:
- name: app
image: nginx:1.25
Step 03.05 - Scoping Pod Affinity to Specific Namespaces¶
# Pod affinity targeting pods in a specific namespace
apiVersion: v1
kind: Pod
metadata:
name: cross-namespace-affinity-pod
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: database
topologyKey: kubernetes.io/hostname
namespaces: # Look for matching pods in these namespaces
- data-plane
- production
# If 'namespaces' is omitted, only the Pod's own namespace is searched
# If 'namespaces' is empty [], all namespaces are searched
containers:
- name: app
image: nginx:1.25
Part 04 - Pod Anti-Affinity (Spread Pods Apart)¶
Pod Anti-Affinityis the opposite ofPod Affinity- it ensures Pods are placed away from other Pods.- Its primary use is high availability: spreading replicas across Nodes, zones, or regions so a single failure doesn’t take down your entire service.
- Anti-Affinity rules live under
spec.affinity.podAntiAffinity.
Step 04.01 - Required Anti-Affinity (One Pod per Node)¶
# deployment-one-per-node.yaml
# This deployment ensures no two replicas end up on the same node
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-nginx
spec:
replicas: 3
selector:
matchLabels:
app: ha-nginx
template:
metadata:
labels:
app: ha-nginx
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- ha-nginx # Avoid nodes that already have this app
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
kubectl apply -f deployment-one-per-node.yaml
kubectl get pods -o wide # Each pod should land on a different node
Required Anti-Affinity can leave Pods Pending
If you have replicas: 5 but only 3 Nodes, 2 Pods will stay Pending because no Node is available that doesn’t already have a matching Pod. Use preferredDuringScheduling if this is a concern.
Step 04.02 - Preferred Anti-Affinity (Prefer Different Nodes)¶
# deployment-prefer-spread.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-spread
spec:
replicas: 5
selector:
matchLabels:
app: web-spread
template:
metadata:
labels:
app: web-spread
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: web-spread
topologyKey: kubernetes.io/hostname
containers:
- name: web
image: nginx:1.25
Step 04.03 - Zone-Level Anti-Affinity (HA Across Zones)¶
# First, verify your nodes have zone labels
kubectl get nodes -o custom-columns=\
NAME:.metadata.name,\
ZONE:.metadata.labels."topology\.kubernetes\.io/zone"
# deployment-zone-ha.yaml
# Ensures each replica lands in a different availability zone
apiVersion: apps/v1
kind: Deployment
metadata:
name: zone-ha-app
spec:
replicas: 3
selector:
matchLabels:
app: zone-ha-app
template:
metadata:
labels:
app: zone-ha-app
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: zone-ha-app
topologyKey: topology.kubernetes.io/zone # One per zone
containers:
- name: app
image: nginx:1.25
Step 04.04 - Combining Pod Affinity and Anti-Affinity¶
# This pod:
# - MUST be in the same zone as at least one 'backend' pod (affinity)
# - MUST NOT be on the same node as another 'frontend' pod (anti-affinity)
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: backend
topologyKey: topology.kubernetes.io/zone
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: frontend
topologyKey: kubernetes.io/hostname
containers:
- name: frontend
image: nginx:1.25
Part 05 - Taints and Tolerations¶
Taintsare applied to Nodes and repel Pods.Tolerationsare applied to Pods and allow them to be scheduled onto tainted Nodes.- Together they implement a whitelist model: a tainted Node only accepts Pods that explicitly opt-in via tolerations.
- This is the opposite of Node Affinity (which is a pull mechanism from the Pod side).
flowchart LR
subgraph "Tainted Node"
N["node-1\nTaint: dedicated=gpu:NoSchedule"]
end
P1["Pod A\n(no toleration)"] --> X["β Cannot schedule\non node-1"]
P2["Pod B\ntoleration: dedicated=gpu:NoSchedule"] --> N
X -.-> OtherNode["Scheduled on\nother node"]
Taint Effects¶
| Effect | New Pods | Existing Pods |
|---|---|---|
NoSchedule |
Blocked (hard) | Not affected |
PreferNoSchedule |
Avoided (soft) | Not affected |
NoExecute |
Blocked (hard) | Evicted if no matching toleration |
Step 05.01 - Apply and Remove Taints¶
# Add NoSchedule taint
kubectl taint nodes node-1 dedicated=gpu:NoSchedule
# Add NoExecute taint (evicts non-tolerating existing pods immediately)
kubectl taint nodes node-2 maintenance=true:NoExecute
# Add PreferNoSchedule taint (soft - pods go elsewhere if possible)
kubectl taint nodes node-3 spot-instance=true:PreferNoSchedule
# View taints on all nodes
kubectl describe nodes | grep -A2 "Taints:"
# Remove a specific taint
kubectl taint nodes node-1 dedicated=gpu:NoSchedule-
# Remove all taints with a key (any value or effect)
kubectl taint nodes node-1 dedicated-
Step 05.02 - Pod Without Toleration¶
# pod-no-toleration.yaml
# This pod cannot be scheduled on node-1 which has dedicated=gpu:NoSchedule
apiVersion: v1
kind: Pod
metadata:
name: regular-pod
spec:
containers:
- name: app
image: nginx:1.25
kubectl apply -f pod-no-toleration.yaml
# If ALL nodes are tainted, the pod stays Pending
kubectl describe pod regular-pod | grep -A5 "Events:"
# Output: 0/N nodes are available: N node(s) had untolerated taint {dedicated: gpu}
Step 05.03 - Pod With Equal Toleration¶
# pod-equal-toleration.yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-app
spec:
tolerations:
- key: "dedicated"
operator: "Equal" # Must match both key AND value
value: "gpu"
effect: "NoSchedule"
containers:
- name: app
image: nginx:1.25
Step 05.04 - Pod With Exists Toleration¶
# pod-exists-toleration.yaml
# Tolerates ANY taint with key 'dedicated', regardless of value
apiVersion: v1
kind: Pod
metadata:
name: flexible-pod
spec:
tolerations:
- key: "dedicated"
operator: "Exists" # Matches any value for this key
effect: "NoSchedule"
containers:
- name: app
image: nginx:1.25
Step 05.05 - Tolerate All Taints (Wildcard)¶
# This pod tolerates ALL taints on ALL nodes
# WARNING: Only use for cluster-system pods (DaemonSets, CNI, etc.)
apiVersion: v1
kind: Pod
metadata:
name: omnipotent-pod
spec:
tolerations:
- operator: "Exists" # No key, no effect = match anything
containers:
- name: app
image: nginx:1.25
Tolerate-All is dangerous
Toleration operator: Exists with no key matches ALL taints. This is used by DaemonSets that must run everywhere (kube-proxy, CNI plugins, log agents). Never use it in application workloads.
Step 05.06 - NoExecute Effect and tolerationSeconds¶
# pod-maintenance-toleration.yaml
# This pod has 10 minutes to finish before being evicted during maintenance
apiVersion: v1
kind: Pod
metadata:
name: long-running-job
spec:
tolerations:
- key: "maintenance"
operator: "Equal"
value: "true"
effect: "NoExecute"
tolerationSeconds: 600 # Stay up to 10 minutes after the taint is applied
containers:
- name: job
image: busybox
command: ["sleep", "infinity"]
# Simulate Node maintenance - taint the node
kubectl taint nodes node-1 maintenance=true:NoExecute
# The pod will continue running for up to 600 seconds, then be evicted
kubectl get pods -o wide --watch
Step 05.07 - Multiple Tolerations on a Single Pod¶
# pod-multi-toleration.yaml
# This pod can run on nodes with multiple specific taints
apiVersion: v1
kind: Pod
metadata:
name: multi-tolerant-pod
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- key: "maintenance"
operator: "Equal"
value: "true"
effect: "NoExecute"
tolerationSeconds: 60
containers:
- name: gpu-app
image: nginx:1.25
Step 05.08 - Tolerating Multiple Effects for the Same Key¶
# Tolerate both NoSchedule and NoExecute for the same key
apiVersion: v1
kind: Pod
metadata:
name: dual-effect-pod
spec:
tolerations:
- key: "node-role"
operator: "Equal"
value: "edge"
effect: "NoSchedule"
- key: "node-role"
operator: "Equal"
value: "edge"
effect: "NoExecute"
# Shortcut: omit 'effect' to match ALL effects for this key/value pair
# - key: "node-role"
# operator: "Equal"
# value: "edge"
containers:
- name: app
image: nginx:1.25
Step 05.09 - Built-in Kubernetes Taints¶
Kubernetes automatically adds these taints to Nodes in various states:
| Taint | Effect | Added when |
|---|---|---|
node.kubernetes.io/not-ready |
NoExecute |
Node Ready condition is False |
node.kubernetes.io/unreachable |
NoExecute |
Node Ready condition is Unknown |
node.kubernetes.io/memory-pressure |
NoSchedule |
Node has memory pressure |
node.kubernetes.io/disk-pressure |
NoSchedule |
Node has disk pressure |
node.kubernetes.io/pid-pressure |
NoSchedule |
Node has PID pressure |
node.kubernetes.io/unschedulable |
NoSchedule |
Node is cordoned |
node.kubernetes.io/network-unavailable |
NoSchedule |
Node network not configured |
node.cloudprovider.kubernetes.io/uninitialized |
NoSchedule |
Cloud provider not finished initializing |
# Application pods should tolerate node.kubernetes.io/not-ready briefly
# to avoid unnecessary rescheduling during transient node issues
apiVersion: apps/v1
kind: Deployment
metadata:
name: resilient-app
spec:
replicas: 3
selector:
matchLabels:
app: resilient-app
template:
metadata:
labels:
app: resilient-app
spec:
tolerations:
# Kubernetes default tolerations - pods wait 5 minutes before rescheduling
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300
containers:
- name: app
image: nginx:1.25
Step 05.10 - Taint-based Node Isolation for Control Plane¶
# Control-plane nodes are automatically tainted:
kubectl describe node <control-plane> | grep Taints
# Taints: node-role.kubernetes.io/control-plane:NoSchedule
# Allow a pod to run on control-plane nodes (e.g., for monitoring)
apiVersion: v1
kind: Pod
metadata:
name: control-plane-monitor
spec:
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
node-role.kubernetes.io/control-plane: ""
containers:
- name: monitor
image: busybox
command: ["sleep", "infinity"]
Part 06 - Combining Node Affinity, Taints, and Tolerations¶
- Using
Node Affinityalone means Pods from other teams can also go to those nodes. - Using
Taintsalone ensures no uninvited Pods land there, but your own Pods still won’t be attracted to the nodes. - The pattern is: Taint the node (repel others) + Use Affinity (attract your pods).
flowchart TD
subgraph "Dedicated GPU Node Pool"
N["gpu-node-1\nTaint: dedicated=gpu:NoSchedule\nLabel: hardware=gpu"]
end
P1["Random Pod (no toleration)"] --> X["β Blocked by Taint"]
P2["GPU Workload\n(toleration + node affinity)"] --> N
style N fill:#2d6a2d,color:#fff
Step 06.01 - Dedicated Node Pool (Taint + Node Affinity)¶
# Setup dedicated GPU nodes
kubectl label nodes gpu-node-1 hardware=gpu accelerator=nvidia
kubectl label nodes gpu-node-2 hardware=gpu accelerator=nvidia
# Taint them to repel non-GPU workloads
kubectl taint nodes gpu-node-1 dedicated=gpu:NoSchedule
kubectl taint nodes gpu-node-2 dedicated=gpu:NoSchedule
# gpu-workload.yaml
apiVersion: v1
kind: Pod
metadata:
name: ml-training-job
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: hardware
operator: In
values: [gpu]
- key: accelerator
operator: In
values: [nvidia]
containers:
- name: train
image: nvidia/cuda:12.0-base
resources:
limits:
nvidia.com/gpu: "1"
Step 06.02 - Multi-Tenant Cluster Team Isolation¶
# Allocate nodes to team-alpha
kubectl label nodes node-1 node-2 team=alpha
kubectl taint nodes node-1 node-2 team=alpha:NoSchedule
# Allocate nodes to team-beta
kubectl label nodes node-3 node-4 team=beta
kubectl taint nodes node-3 node-4 team=beta:NoSchedule
# team-alpha deployment - only lands on alpha nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: alpha-service
namespace: team-alpha
spec:
replicas: 2
selector:
matchLabels:
app: alpha-service
template:
metadata:
labels:
app: alpha-service
spec:
tolerations:
- key: "team"
operator: "Equal"
value: "alpha"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: team
operator: In
values: [alpha]
containers:
- name: service
image: nginx:1.25
Step 06.03 - Spot Instance Nodes with Graceful Fallback¶
# Label and taint spot instances
kubectl label nodes spot-1 spot-2 node-lifecycle=spot
kubectl taint nodes spot-1 spot-2 spot-instance=true:PreferNoSchedule
# batch-job.yaml - prefer spot, but fall back to on-demand
apiVersion: batch/v1
kind: Job
metadata:
name: batch-data-processor
spec:
template:
spec:
tolerations:
- key: "spot-instance"
operator: "Equal"
value: "true"
effect: "PreferNoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-lifecycle
operator: In
values: [spot]
restartPolicy: OnFailure
containers:
- name: processor
image: busybox
command: ["sh", "-c", "echo processing && sleep 60"]
# critical-service.yaml - explicitly AVOID spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
name: critical-api
spec:
replicas: 3
selector:
matchLabels:
app: critical-api
template:
metadata:
labels:
app: critical-api
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-lifecycle
operator: NotIn
values: [spot, preemptible]
containers:
- name: api
image: nginx:1.25
Part 07 - Topology Spread Constraints¶
Topology Spread Constraintsgive you explicit control over how Pods are distributed across topology domains.- Unlike Anti-Affinity (which blocks placement), Topology Spread Constraints let you specify the maximum imbalance allowed.
- They are more predictable and flexible than Anti-Affinity for distribution scenarios.
Key Fields¶
| Field | Description |
|---|---|
maxSkew |
Maximum difference in Pod count between the most and least loaded domains. maxSkew: 1 means no domain can have more than 1 extra Pod |
topologyKey |
The node label that defines the topology domains (e.g., zone, hostname) |
whenUnsatisfiable |
DoNotSchedule (hard) or ScheduleAnyway (soft - best effort) |
labelSelector |
Which Pods to count when computing skew |
minDomains |
Minimum number of topology domains that must be available (requires at least this many domains to exist) |
nodeAffinityPolicy |
Whether to honor nodeAffinity/nodeSelector when counting pods: Honor (default) or Ignore |
nodeTaintsPolicy |
Whether to honor taints when counting: Honor or Ignore (default) |
Step 07.01 - Basic Zone Spreading¶
# deployment-zone-spread.yaml
# Spread pods evenly across zones, allowing max 1 extra pod in any zone
apiVersion: apps/v1
kind: Deployment
metadata:
name: zone-spread-app
spec:
replicas: 6
selector:
matchLabels:
app: zone-spread-app
template:
metadata:
labels:
app: zone-spread-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule # Hard constraint
labelSelector:
matchLabels:
app: zone-spread-app
containers:
- name: app
image: nginx:1.25
Step 07.02 - Node-Level Spreading¶
# deployment-node-spread.yaml
# Spread pods evenly across nodes (max 1 extra pod per node)
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-spread-app
spec:
replicas: 9
selector:
matchLabels:
app: node-spread-app
template:
metadata:
labels:
app: node-spread-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname # Each node is a domain
whenUnsatisfiable: ScheduleAnyway # Soft - still schedules if can't spread
labelSelector:
matchLabels:
app: node-spread-app
containers:
- name: app
image: nginx:1.25
Step 07.03 - Multi-Level Constraints (Zone AND Node)¶
# deployment-multi-spread.yaml
# Spread evenly across zones first, then evenly across nodes within zones
apiVersion: apps/v1
kind: Deployment
metadata:
name: multi-level-spread
spec:
replicas: 12
selector:
matchLabels:
app: multi-level-spread
template:
metadata:
labels:
app: multi-level-spread
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone # Zone-level spread
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: multi-level-spread
- maxSkew: 1
topologyKey: kubernetes.io/hostname # Node-level spread
whenUnsatisfiable: ScheduleAnyway # Best effort for nodes
labelSelector:
matchLabels:
app: multi-level-spread
containers:
- name: app
image: nginx:1.25
Step 07.04 - minDomains (Ensure Minimum Domain Count)¶
# deployment-min-domains.yaml
# Only schedule if at least 3 zones are available
apiVersion: apps/v1
kind: Deployment
metadata:
name: min-domain-app
spec:
replicas: 6
selector:
matchLabels:
app: min-domain-app
template:
metadata:
labels:
app: min-domain-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
minDomains: 3 # Must have at least 3 zones available
labelSelector:
matchLabels:
app: min-domain-app
containers:
- name: app
image: nginx:1.25
Step 07.05 - Topology Spread Constraints vs Anti-Affinity¶
| Aspect | Anti-Affinity (required) | Topology Spread Constraint |
|---|---|---|
| Semantics | One pod per domain (binary) | Balance up to maxSkew |
| Flexibility | Rigid - pod Pending if >1 per domain | Flexible - maxSkew allows limited stacking |
| Multiple replicas per domain | Not possible (required) | Yes, balanced by maxSkew |
| Partial failure handling | Pod stays pending | ScheduleAnyway for soft behavior |
| Multiple topology levels | Requires chaining | Native multi-constraint support |
Part 08 - Real-World Scenarios¶
Scenario 08.01 - Database + Stateful Service Co-Location¶
# Deploy a Redis cache that MUST be co-located with application pods
# Application pods get ~1ms latency to Redis when on the same node
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: redis-local
spec:
selector:
matchLabels:
app: redis-local
template:
metadata:
labels:
app: redis-local
tier: cache
spec:
tolerations:
- operator: "Exists" # Run on all nodes including tainted ones
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 6
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: redis-local # MUST be on same node as Redis
topologyKey: kubernetes.io/hostname
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: web-app # Prefer different nodes for web app replicas
topologyKey: kubernetes.io/hostname
containers:
- name: web
image: nginx:1.25
Scenario 08.02 - High-Availability Web Service (3 Zones)¶
# ha-web-service.yaml
# Web: must spread across 3 zones, one replica per zone minimum
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-web
spec:
replicas: 6
selector:
matchLabels:
app: ha-web
template:
metadata:
labels:
app: ha-web
tier: web
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: ha-web
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: [web, general]
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
tier: web
topologyKey: kubernetes.io/hostname
containers:
- name: web
image: nginx:1.25
Scenario 08.03 - Node Maintenance (Drain Workflow)¶
# 1. Cordon the node (adds node.kubernetes.io/unschedulable taint)
kubectl cordon node-1
# 2. Drain the node (evicts all pods gracefully)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data --grace-period=60
# 3. Perform maintenance...
# 4. Uncordon the node to allow scheduling again
kubectl uncordon node-1
# Check scheduling status
kubectl get nodes
# STATUS: Ready (not SchedulingDisabled) = uncordoned
Scenario 08.04 - Burstable Workloads on Spot Instances¶
# production-baseline.yaml - always on on-demand nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-baseline
spec:
replicas: 2
selector:
matchLabels:
app: api
tier: baseline
template:
metadata:
labels:
app: api
tier: baseline
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-lifecycle
operator: In
values: [on-demand]
containers:
- name: api
image: nginx:1.25
---
# burst-capacity.yaml - scale-out pods go to cheaper spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-burst
spec:
replicas: 0 # HPA will scale this up during bursts
selector:
matchLabels:
app: api
tier: burst
template:
metadata:
labels:
app: api
tier: burst
spec:
tolerations:
- key: "spot-instance"
operator: "Exists"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-lifecycle
operator: In
values: [spot]
containers:
- name: api
image: nginx:1.25
Scenario 08.05 - Dedicated Nodes for Monitoring Stack¶
# Create a dedicated monitoring node pool
kubectl label nodes monitoring-1 monitoring-2 role=monitoring
kubectl taint nodes monitoring-1 monitoring-2 role=monitoring:NoSchedule
# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 2
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
tolerations:
- key: "role"
operator: "Equal"
value: "monitoring"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values: [monitoring]
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: prometheus
topologyKey: kubernetes.io/hostname
containers:
- name: prometheus
image: prom/prometheus:latest
ports:
- containerPort: 9090
Part 09 - DaemonSets and System Pods¶
DaemonSetsrun one Pod per Node.- System DaemonSets (CNI, kube-proxy, log agents) need to run even on tainted Nodes.
- They use the wildcard toleration or specific tolerations for known system taints.
Step 09.01 - DaemonSet on All Nodes Including Tainted¶
# node-log-agent.yaml
# Log agent that runs on ALL nodes, including control-plane and tainted nodes
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-agent
namespace: kube-system
spec:
selector:
matchLabels:
app: log-agent
template:
metadata:
labels:
app: log-agent
spec:
tolerations:
# Standard system node taints
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
# Node lifecycle taints
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/disk-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoSchedule
# Custom workload taints - agents still need to run on GPU/special nodes
- key: dedicated
operator: Exists
effect: NoSchedule
containers:
- name: log-agent
image: fluent/fluent-bit:latest
Step 09.02 - DaemonSet on a Subset of Nodes (Node Affinity)¶
# gpu-node-daemonset.yaml
# Driver installer that runs only on GPU nodes
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-driver-installer
namespace: gpu-system
spec:
selector:
matchLabels:
app: nvidia-driver-installer
template:
metadata:
labels:
app: nvidia-driver-installer
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: hardware
operator: In
values: [gpu]
containers:
- name: installer
image: nvidia/gpu-operator:latest
Part 10 - Observability and Debugging Scheduling Issues¶
Step 10.01 - Check Why a Pod is Pending¶
# Describe the pod for scheduling events
kubectl describe pod <pod-name>
# Look for the "Events" section - FailedScheduling explains why
# Common messages:
# "0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector"
# "0/3 nodes are available: 3 node(s) had untolerated taint {dedicated: gpu}"
# "0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules"
# "0/3 nodes are available: 1 Insufficient cpu, 2 Insufficient memory"
# Get all scheduler events
kubectl get events -n <namespace> --field-selector reason=FailedScheduling
# Sort by time
kubectl get events --sort-by='.lastTimestamp' | grep Failed
# Watch scheduling events in real time
kubectl get events -w | grep -E "FailedScheduling|Scheduled"
Step 10.02 - Verify Node Labels Match Affinity Rules¶
# Check if a node has the required label
kubectl get node node-1 -o jsonpath='{.metadata.labels}' | jq
# Check with a specific label key
kubectl get nodes -l environment=production
# Show which nodes match a specific label selector
kubectl get nodes --selector='environment in (production,staging)'
Step 10.03 - Verify Taints on Nodes¶
# Show taints on all nodes in a table
kubectl get nodes -o custom-columns=\
NAME:.metadata.name,\
TAINTS:.spec.taints
# Check if a pod's tolerations match node taints
kubectl get pod <pod-name> -o jsonpath='{.spec.tolerations}' | jq
kubectl get node <node-name> -o jsonpath='{.spec.taints}' | jq
Step 10.04 - Simulate Scheduling (Dry Run)¶
# Try applying a pod definition to see if it would schedule
kubectl apply -f pod.yaml --dry-run=server
# Use kubectl describe to see what nodes are eligible after a failed schedule
kubectl describe pod <pending-pod> | grep -A 20 "Events:"
Part 11 - Complete Scenario: Production-Grade Multi-Tier Application¶
This scenario combines everything to deploy a 3-tier application (frontend, backend, database) with: - Zone-spread frontend pods - Backend co-located in the same zones as frontend - Database on dedicated storage nodes - Monitoring agents on every node - Spot instance Nodes for batch jobs
# ===== Setup: Label and taint cluster nodes =====
# Zone assignment (cloud providers set these automatically)
kubectl label nodes node-1 node-2 topology.kubernetes.io/zone=us-east-1a
kubectl label nodes node-3 node-4 topology.kubernetes.io/zone=us-east-1b
kubectl label nodes node-5 node-6 topology.kubernetes.io/zone=us-east-1c
# Node types
kubectl label nodes node-1 node-3 node-5 node-type=web
kubectl label nodes node-2 node-4 node-6 node-type=storage disk-type=ssd
# Storage nodes are dedicated
kubectl taint nodes node-2 node-4 node-6 dedicated=storage:NoSchedule
# frontend-deployment.yaml - spread across zones, one per node
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
tier: web
template:
metadata:
labels:
app: frontend
tier: web
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: frontend
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: [web]
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: frontend
topologyKey: kubernetes.io/hostname
containers:
- name: frontend
image: nginx:1.25
ports:
- containerPort: 80
# backend-deployment.yaml - co-located in same zones as frontend
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
spec:
replicas: 3
selector:
matchLabels:
app: backend
tier: api
template:
metadata:
labels:
app: backend
tier: api
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: [web]
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
tier: web # Be in same zone as frontend
topologyKey: topology.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: backend
topologyKey: kubernetes.io/hostname
containers:
- name: backend
image: nginx:1.25
ports:
- containerPort: 8080
# database-statefulset.yaml - dedicated storage nodes
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: database
spec:
serviceName: database
replicas: 3
selector:
matchLabels:
app: database
tier: data
template:
metadata:
labels:
app: database
tier: data
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "storage"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: [storage]
- key: disk-type
operator: In
values: [ssd]
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: database
topologyKey: topology.kubernetes.io/zone # One DB replica per zone
containers:
- name: db
image: postgres:16
env:
- name: POSTGRES_PASSWORD
value: "changeme"
Exercises¶
Exercise 1: Schedule a Pod only on nodes with β₯ 16 CPU cores labeled as cpu-cores
**Solution:**
Exercise 2: Deploy 4 replicas of an app with HARD anti-affinity across nodes, then observe Pending pods
**Solution:**# First, deploy with 4 replicas - if fewer than 4 nodes, some will Pending
apiVersion: apps/v1
kind: Deployment
metadata:
name: anti-affinity-test
spec:
replicas: 4
selector:
matchLabels:
app: anti-affinity-test
template:
metadata:
labels:
app: anti-affinity-test
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: anti-affinity-test
topologyKey: kubernetes.io/hostname
containers:
- name: app
image: nginx:1.25
Exercise 3: Taint a node with NoExecute and observe pod eviction, then add a toleration with tolerationSeconds: 120
**Solution:**
# Deploy some pods on all nodes
kubectl run test-evict --image=nginx --replicas=3
# Find which node one of the pods is on
kubectl get pods -o wide
# Taint that node with NoExecute - all pods without toleration will be evicted
kubectl taint nodes <node-name> eviction-test=true:NoExecute
# Watch the pods being evicted and rescheduled
kubectl get pods -o wide --watch
Exercise 4: Deploy 9 pods spread evenly across 3 zones with maxSkew=1, then scale to 12 pods
**Solution:**apiVersion: apps/v1
kind: Deployment
metadata:
name: spread-exercise
spec:
replicas: 9
selector:
matchLabels:
app: spread-exercise
template:
metadata:
labels:
app: spread-exercise
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: spread-exercise
containers:
- name: app
image: nginx:1.25
Exercise 5: Create a "data locality pattern" - Redis DaemonSet + application pods that MUST be on the same node as Redis
**Solution:**# Step 1: DaemonSet ensures Redis runs on every node
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: redis-local
spec:
selector:
matchLabels:
type: redis-local
template:
metadata:
labels:
type: redis-local
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
---
# Step 2: App pods MUST be co-located with Redis (same node)
apiVersion: apps/v1
kind: Deployment
metadata:
name: latency-sensitive-app
spec:
replicas: 4
selector:
matchLabels:
app: latency-sensitive-app
template:
metadata:
labels:
app: latency-sensitive-app
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
type: redis-local # Must be on same node as Redis DaemonSet pod
topologyKey: kubernetes.io/hostname
containers:
- name: app
image: nginx:1.25
env:
- name: REDIS_HOST
value: "localhost" # Redis is on the same node
Cleanup¶
# Remove all test pods
kubectl delete pod --all -n default
# Remove test deployments
kubectl delete deployment ha-nginx web-spread zone-ha-app --ignore-not-found
kubectl delete deployment gpu-app multi-level-spread spread-exercise --ignore-not-found
kubectl delete deployment frontend backend ha-web critical-api api-baseline --ignore-not-found
kubectl delete statefulset database --ignore-not-found
kubectl delete daemonset redis-local log-agent --ignore-not-found
# Remove labels from nodes (replace node-1, node-2 with your actual node names)
kubectl label nodes --all environment- zone- disk-type- hardware- --overwrite 2>/dev/null || true
kubectl label nodes --all accelerator- team- node-lifecycle- role- --overwrite 2>/dev/null || true
kubectl label nodes --all node-type- memory-gb- cpu-cores- --overwrite 2>/dev/null || true
# Remove taints from nodes
kubectl taint nodes --all dedicated- 2>/dev/null || true
kubectl taint nodes --all maintenance- 2>/dev/null || true
kubectl taint nodes --all spot-instance- 2>/dev/null || true
kubectl taint nodes --all team- role- eviction-test- 2>/dev/null || true
# Verify nodes are clean
kubectl describe nodes | grep -A3 "Taints:"
kubectl get nodes --show-labels
Summary¶
In this lab you learned:
| Topic | Covered |
|---|---|
nodeSelector |
Simple exact-match node label filtering |
| Node Affinity | Rich operators (In, NotIn, Exists, DoesNotExist, Gt, Lt), required + preferred, weight-based scoring, AND/OR logic |
| Pod Affinity | Co-locate Pods on same node or same topology domain, namespace scoping |
| Pod Anti-Affinity | Spread Pods across nodes and zones, hard + soft variants |
| Taints | Node-side repulsion with NoSchedule, PreferNoSchedule, NoExecute effects |
| Tolerations | Pod-side opt-in with Equal/Exists operators, tolerationSeconds |
| Built-in Taints | System lifecycle taints (not-ready, unreachable, disk-pressure, etc.) |
| Topology Spread | Balance Pod counts across topology domains with maxSkew, minDomains |
| Patterns | GPU pools, zone-HA, spot bursting, multi-tenant isolation, data locality |
| Debugging | FailedScheduling events, label/taint inspection, dry-run |
Decision Guide¶
flowchart TD
Q1{"Do you need to\ncontrol which NODES\nPods go to?"}
Q1 -- Yes --> Q2{"Simple exact\nlabel match?"}
Q2 -- Yes --> A1["Use nodeSelector"]
Q2 -- No --> A2["Use Node Affinity\n(required or preferred)"]
Q1 -- No --> Q3{"Do you need to\nplace Pods relative\nto OTHER Pods?"}
Q3 -- "Near each other" --> A3["Use Pod Affinity"]
Q3 -- "Away from each other" --> A4["Use Pod Anti-Affinity"]
Q3 -- "Balanced spread" --> A5["Use Topology Spread\nConstraints"]
Q3 -- No --> Q4{"Do you need to\nREPEL pods from\na node?"}
Q4 -- Yes --> A6["Use Taints on Node\n+ Tolerations on Pod"]
Q4 -- No --> A7["Default scheduler\nbehavior is fine"]
style A1 fill:#1a73e8,color:#fff
style A2 fill:#1a73e8,color:#fff
style A3 fill:#34a853,color:#fff
style A4 fill:#34a853,color:#fff
style A5 fill:#fbbc04,color:#000
style A6 fill:#ea4335,color:#fff
Troubleshooting¶
- Pod stuck in Pending:
Check the pod events for scheduling failures - the event message tells you exactly which constraint was violated:
kubectl describe pod <pod-name> | grep -A10 "Events:"
# Common messages:
# "0/N nodes are available: N node(s) didn't match Pod's node affinity/selector"
# "0/N nodes are available: N node(s) had untolerated taint {key: value}"
# "0/N nodes are available: N node(s) didn't match pod anti-affinity rules"
- Anti-affinity blocking new pods:
If you have more replicas than nodes and use requiredDuringScheduling anti-affinity, excess pods stay Pending. Switch to preferredDuringScheduling:
- Taint applied but pods still running on the node:
NoSchedule only blocks new pods. Use NoExecute to evict existing pods:
kubectl describe nodes | grep -A3 "Taints:"
# Change NoSchedule to NoExecute if you want to evict running pods
kubectl taint nodes <node> key=value:NoExecute
- Labels not matching affinity rules:
Verify that node labels actually exist and match your affinity selectors:
kubectl get nodes --show-labels | grep <expected-label>
kubectl get node <node-name> -o jsonpath='{.metadata.labels}' | jq
- Topology spread constraints not balancing evenly:
Ensure your nodes have the correct topology labels (topology.kubernetes.io/zone, kubernetes.io/hostname):
Next Steps¶
- Explore Pod Priority and Preemption for workload prioritization.
- Learn about Descheduler for rebalancing pods after scheduling decisions.
- Try combining scheduling constraints with PodDisruptionBudgets (Lab 17) for production-grade high availability.
- Explore Custom Schedulers (Lab 19) for when built-in scheduling constraints aren’t enough.
- Practice scheduling tasks in the Kubernetes Scheduling Tasks section.
Additional Resources¶
Pod Disruption Budgets (PDB)¶
- In this lab, we will learn about
Pod Disruption Budgets (PDB)in Kubernetes. - We will explore how to define and implement PDBs to ensure application availability during voluntary disruptions, such as node maintenance or cluster upgrades.
- By the end of this lab, you will understand how to create and manage Pod Disruption Budgets to maintain the desired level of service availability in your Kubernetes cluster.
What will we learn?¶
- What Pod Disruption Budgets are and why they are important
- How PDBs protect applications during voluntary disruptions
- How to define PDBs using
minAvailableormaxUnavailable - How Kubernetes eviction policies interact with PDBs
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster- Minikube (for feature gates configuration)
Introduction¶
-
A
pod disruption budgetis an indicator of the number of disruptions that can be tolerated at a given time for a class of pods (a budget of faults). -
Disruptions may be caused by deliberate or accidental Pod deletion.
-
Whenever a disruption to the pods in a service is calculated to cause the service to drop below the budget, the operation is paused until it can maintain the budget. This means that the
drain eventcould be temporarily halted while it waits for more pods to become available such that the budget isnβt crossed by evicting the pods. -
You can specify Pod Disruption Budgets for Pods managed by these built-in Kubernetes controllers:
DeploymentReplicationControllerReplicaSetStatefulSet
-
For this tutorial you should get familier with Kubernetes Eviction Policies, as it demonstrates how
Pod Disruption Budgetshandle evictions. -
As in the
Kubernetes Eviction Policiestutorial, we start with
PDB Example¶
-
In the below sample we will configure a
Pod Disruption Budgetwhich insure that we will always have at least 1 Nginx instance. -
First we need an Nginx Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: codewizard
labels:
app: nginx # <- We will use this name below
...
- Now we can create the
Pod Disruption Budget:
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: nginx-pdb
spec:
minAvailable: 1 # <--- This will insure that we will have at least 1
selector:
matchLabels:
app: nginx # <- The deployment app label
Lab¶
01. start minikube with Feature Gates
Step 01 - Start Minikube with Feature Gates¶
- Run thwe following command to start minikube with the required
Feature GatesandEviction Signals:
minikube start \
--extra-config=kubelet.eviction-hard="memory.available<480M" \
--extra-config=kubelet.eviction-pressure-transition-period="30s" \
--extra-config=kubelet.feature-gates="ExperimentalCriticalPodAnnotation=true"
-
For more details about
Feature Gates, read here. -
For more details about
eviction-signals, read here.
Step 02 - Check Node Pressure(s)¶
- Check to see the Node conditions, if we have any kind of “Pressure”, by running the following:
kubectl describe node minikube | grep MemoryPressure
# Output should be similar to :
Conditions:
Type Status Reason Message
---- ------ ------ -------
MemoryPressure False KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False KubeletHasSufficientPID kubelet has sufficient PID available
Ready True KubeletReady kubelet is posting ready status. AppArmor enabled
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (37%) 0 (0%)
memory 140Mi (6%) 340Mi (16%)
ephemeral-storage 0 (0%) 0 (0%)
Step 03 - Create 3 Pods Using 50 MB Each¶
- Create a file named
50MB-ram.yamlwith the following content:
# ./resources/50MB-ram.yaml
...
# 3 replicas
spec:
replicas: 3
# resources request and limits
resources:
requests:
memory: "50Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- Create the pods with the following command:
Step 04 - Check Memory Pressure¶
- Now let’s check the Node conditions again to see if we have
MemoryPressure:
kubectl describe node minikube | grep MemoryPressure
# Output should be similar to
MemoryPressure False ... KubeletHasSufficientMemory kubelet has sufficient memory available
sufficient memory available.
Cleanup¶
Writing a Custom Scheduler¶
Schedulingis the process of selecting a node for a pod to run on.- In this lab we will write our own pods
scheduler. - It is probably not something that you will ever need to do, but still it’s a good practice to understand how scheduling works in K8S and how you can extend it.
What will we learn?¶
- How scheduling works in Kubernetes
- How to write a custom scheduler
- How to assign a pod to a specific scheduler using
.spec.schedulerName - How scheduling profiles and extension points work
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster- Docker installed (for building custom images)
Introduction¶
- See further information in the official documentation: Scheduler Configuration
- To schedule a given pod using a specific scheduler, specify the name of the scheduler in that specification
.spec.schedulerName. - Scheduling happens in a series of stages that are exposed through extension points.
- We can define several scheduling Profile. A scheduling Profile allows you to configure the different stages of scheduling in the
kube-scheduler
Sample KubeSchedulerConfiguration¶
###
# Sample KubeSchedulerConfiguration
###
#
# You can configure `kube-scheduler` to run more than one profile.
# Each profile has an associated scheduler name and can have a different
# set of plugins configured in its extension points.
# With the following sample configuration,
# the scheduler will run with two profiles:
# - default plugins
# - all scoring plugins disabled
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
- schedulerName: no-scoring-scheduler
plugins:
preScore:
disabled:
- name: '*'
score:
disabled:
- name: '*'
- Once you have your scheduler code, you can use it in your pod scheduler:
# In this sample we use deployment but it will apply to any pod
...
apiVersion: apps/v1
kind: Deployment
spec:
spec:
# This is the import part of this file.
# Here we define our custom scheduler
schedulerName: CodeWizardScheduler # <------
containers:
- name: nginx
image: nginx
Sample Bash Scheduler¶
- The “trick” is loop over all the waiting pods and search for the custom scheduler match in
spec.schedulerName
...
# Get a list of all our pods in pending state
for POD in $(kubectl get pods \
--server ${CLUSTER_URL} \
--all-namespaces \
--output jsonpath='{.items..metadata.name}' \
--field-selector=status.phase==Pending);
do
# Get the desired schedulerName if th epod has defined any schedulerName
CUSTOM_SCHEDULER_NAME=$(kubectl get pod ${POD} \
--output jsonpath='{.spec.schedulerName}')
# Check if the desired schedulerName is our custome one
# If its a match this is where our custom scheduler will "jump in"
if [ "${CUSTOM_SCHEDULER_NAME}" == "${CUSTOM_SCHEDULER}" ];
then
# Do your magic here ......
# Schedule the PODS as you wish
fi
...
Kube API Access from Pod¶
- In this lab, we will learn how to access the Kubernetes API from within a Pod.
- We will create a simple Pod that runs a script to query the Kubernetes API server and retrieve information about the cluster.
What will we learn?¶
- How to access the Kubernetes API from within a Pod
- How Kubernetes Service Account tokens are mounted inside pods
- How to build a custom Docker image for API access
- How to deploy and test the API query using Kustomize
Prerequisites¶
- A running Kubernetes cluster (
kubectl cluster-infoshould work) kubectlconfigured against the cluster- Docker installed (for building custom images)
Part 01 - Build the Docker Image¶
- In order to demonstrate the API query we will build a custom docker image.
- It is optional to use the pre-build image and skip this step.
Step 01 - The Script for Querying K8S API¶
- In order to be able to access K8S API from within a pod, we will be using the following script:
# `api_query.sh`
#!/bin/sh
#################################
## Access the internal K8S API ##
#################################
# Point to the internal API server hostname
API_SERVER_URL=https://kubernetes.default.svc
# Path to ServiceAccount token
# The service account is mapped by the K8S Api server in the pods
SERVICE_ACCOUNT_FOLDER=/var/run/secrets/kubernetes.io/serviceaccount
# Read this Pod's namespace if required
# NAMESPACE=$(cat ${SERVICE_ACCOUNT_FOLDER}/namespace)
# Read the ServiceAccount bearer token
TOKEN=$(cat ${SERVICE_ACCOUNT_FOLDER}/token)
# Reference the internal certificate authority (CA)
CACERT=${SERVICE_ACCOUNT_FOLDER}/ca.crt
# Explore the API with TOKEN and the Certificate
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${API_SERVER_URL}/api
Step 02 - Build the Docker Image¶
- For the pod image we will use the following Dockerfile:
# `Dockerfile`
FROM alpine
# Update and install dependencies
RUN apk add --update nodejs npm curl
# Copy the endpoint script
COPY api_query.sh .
# Set the execution bit
RUN chmod +x api_query.sh .
Part 02 - Deploy the Pod to K8S¶
- Once the image is ready, we can deploy it as a pod to the cluster.
- The required resources are under the k8s folder.
Step 01 - Run Kustomization to Deploy¶
- Deploy to the cluster
# Remove old content if any
kubectl kustomize k8s | kubectl delete -f -
# Deploy the content
kubectl kustomize k8s | kubectl apply -f -
Step 02 - Query the K8S API¶
- Run the following script to verify that the connection to the API is working:
Kubebuilder - Building Kubernetes Operators¶
Kubebuilderis an SDK for building production-grade Kubernetes APIs and controllers (Operators) using Go and thecontroller-runtimelibrary.- Instead of writing low-level machinery by hand,
Kubebuilderscaffolds everything - CRD types, RBAC manifests, Makefile targets, and the reconcile loop - so you can focus on business logic. - In this lab we build a real WebApp Operator that manages a
WebAppcustom resource and automatically provisions the correctDeployment,Service, andConfigMapin the cluster.
What will we learn?¶
- What the Operator pattern is and why it exists
- How
Kubebuilderscaffolds a complete operator project - How to define a Custom Resource Definition (CRD) with validation markers
- How to write a Reconciliation Loop using
controller-runtime - How to manage child resources (
Deployment,Service,ConfigMap) and their ownership - How to update Status subresources and surface conditions
- How to run the operator locally and in-cluster
- How to write admission webhooks for defaulting and validation
- How to write controller tests with
envtest - How to build and push the operator Docker image and deploy via Kustomize
Official Documentation & References¶
| Resource | Link |
|---|---|
| Kubebuilder Book (official) | book.kubebuilder.io |
| Kubebuilder GitHub | github.com/kubernetes-sigs/kubebuilder |
| controller-runtime docs | pkg.go.dev/sigs.k8s.io/controller-runtime |
| Kubernetes API Conventions | github.com/kubernetes/community/contributors/devel/sig-architecture/api-conventions.md |
| CRD Validation Markers | book.kubebuilder.io/reference/markers/crd-validation |
| RBAC Markers | book.kubebuilder.io/reference/markers/rbac |
| Operator SDK (alternative) | sdk.operatorframework.io |
| OperatorHub.io | operatorhub.io |
| envtest (controller tests) | book.kubebuilder.io/cronjob-tutorial/writing-tests |
| Go Modules | go.dev/ref/mod |
Introduction¶
What is the Operator Pattern?¶
- Kubernetes manages built-in resources (Pods, Deployments, Services) with built-in controllers that run a reconciliation loop.
- A Kubernetes Operator extends this pattern to your own domain-specific resources.
- An operator is a combination of:
- A Custom Resource Definition (CRD) - defines the new resource type and its schema in the Kubernetes API
- A Controller - watches for changes to the custom resource and reconciles the cluster state towards the desired state
flowchart LR
user["Developer\nkubectl apply -f webapp.yaml"] --> api["Kubernetes API Server"]
api --> etcd["etcd\n(stores WebApp object)"]
api --> ctrl["WebApp Controller\n(our operator)"]
ctrl -->|"Reconcile Loop\nCreate/Update/Delete"| deploy["Deployment"]
ctrl --> svc["Service"]
ctrl --> cm["ConfigMap"]
ctrl -->|"Status update"| api
When should you write an Operator?¶
| Use case | Example |
|---|---|
| Manage a stateful application lifecycle | Database cluster (create, backup, restore, scale, upgrade) |
| Encode operational runbook as code | Auto-healing, canary rollouts |
| Extend Kubernetes with domain knowledge | CI/CD pipelines, ML training jobs |
| Complex multi-resource coordination | Provision Deployment + Service + Certificate as a single object |
Kubebuilder vs Raw client-go¶
| Raw client-go | Kubebuilder | |
|---|---|---|
| Code scaffolding | Manual | Automated |
| CRD schema generation | Manual YAML | Auto-generated from Go struct + markers |
| RBAC generation | Manual | Auto-generated from //+kubebuilder:rbac markers |
| Controller boilerplate | Manual | Scaffolded |
| Testing framework | DIY | envtest built in |
| Webhook scaffolding | Manual | Scaffolded |
Terminology¶
| Term | Description |
|---|---|
| CRD | Custom Resource Definition - registers a new resource type with the Kubernetes API |
| CR | Custom Resource - an instance of a CRD (like a Pod is an instance of the Pod resource) |
| Operator | A controller that implements domain-specific logic for a custom resource |
| Reconciler | The Go struct that implements the Reconcile(ctx, req) method |
| Reconcile Loop | Watch β Diff β Act cycle that continuously drives the cluster toward desired state |
| Desired State | What the user declared in the CR spec |
| Observed State | What is actually running in the cluster |
| Finalizer | A string added to .metadata.finalizers; prevents deletion until cleanup logic finished |
| Owner Reference | A pointer from a child resource (e.g. Deployment) back to its parent (WebApp CR) |
| Status Subresource | A separate sub-API for writing .status without triggering watches on .spec |
| Marker | A Go comment like //+kubebuilder:... that drives code/manifest generation |
| envtest | A test environment that starts a real kube-apiserver + etcd binary for integration tests |
| Webhook | HTTP server ArgoCD calls before creating/updating resources; used for defaulting and validation |
Architecture¶
graph TB
subgraph dev["Developer Workflow"]
code["Write Go types + reconciler"]
gen["make generate\n(deepcopy funcs)"]
manifest["make manifests\n(CRD YAML, RBAC YAML)"]
test["make test\n(envtest suite)"]
docker["make docker-build docker-push\n(build operator image)"]
deploy["make deploy\n(kustomize | kubectl apply)"]
end
subgraph cluster["Kubernetes Cluster"]
crd["CRD: webapps.apps.codewizard.io"]
ns["Namespace: webapp-system"]
ctrl_pod["Controller Pod\n(our operator binary)"]
webhook_svc["Webhook Service"]
subgraph managed["User Namespace"]
webapp_cr["WebApp CR\n(desired state)"]
cm["ConfigMap\n(HTML content)"]
dep["Deployment\n(nginx pods)"]
svc["Service\n(ClusterIP)"]
end
end
code --> gen --> manifest --> test --> docker --> deploy
deploy --> crd
deploy --> ns
deploy --> ctrl_pod
ctrl_pod -->|"watch + reconcile"| webapp_cr
ctrl_pod -->|"owns"| cm
ctrl_pod -->|"owns"| dep
ctrl_pod -->|"owns"| svc
ctrl_pod -->|"updates"| webapp_cr
Project Structure¶
After running kubebuilder init and kubebuilder create api, the project looks like:
webapp-operator/
βββ api/
β βββ v1/
β βββ groupversion_info.go # Group/Version registration
β βββ webapp_types.go # CRD Go types (Spec, Status, markers)
β βββ zz_generated.deepcopy.go # Auto-generated (make generate)
β
βββ internal/
β βββ controller/
β βββ webapp_controller.go # Reconcile() implementation
β βββ webapp_controller_test.go # envtest-based integration tests
β
βββ config/
β βββ crd/ # Generated CRD YAML manifests
β βββ rbac/ # Generated RBAC manifests
β βββ manager/ # Controller Deployment manifests
β βββ default/ # Kustomize base that wires everything together
β βββ webhook/ # Webhook certificates and Service
β βββ samples/
β βββ apps_v1_webapp.yaml # Example CR for testing
β
βββ cmd/
β βββ main.go # Entry point: registers scheme, starts manager
β
βββ Dockerfile # Multi-stage build for the operator image
βββ Makefile # All development targets
βββ go.mod # Go module definition
βββ go.sum # Dependency checksums
Common Kubebuilder Commands¶
kubebuilder init - Initialize a new operator project
Syntax: kubebuilder init --domain <domain> --repo <module>
Description: Scaffolds the complete project skeleton: cmd/main.go, Makefile, go.mod, base Kustomize configs, and .gitignore.
--domainsets the API group suffix (e.g., resources will be<group>.<domain>)--reposets the Go module path-
--plugins=go/v4(default) uses the latest stable Go plugin# Initialize with domain codewizard.io kubebuilder init \ --domain codewizard.io \ --repo codewizard.io/webapp-operator # Initialize with an older plugin (kustomize only, no controller) kubebuilder init \ --domain my.domain \ --repo my.domain/guestbook \ --plugins=kustomize/v2-alpha # Verify the scaffolded structure ls -la cat go.mod cat Makefile
kubebuilder create api - Create a new CRD + Controller
Syntax: kubebuilder create api --group <group> --version <version> --kind <Kind>
Description: Scaffolds a new API type (CRD struct in api/<version>/<kind>_types.go) and a controller stub (internal/controller/<kind>_controller.go).
- Prompts whether to create the Resource (CRD type) and the Controller
- Adds the type to the scheme and wires the controller into
cmd/main.go -
Can be run multiple times to add more API kinds to the same project
# Create WebApp API and controller kubebuilder create api \ --group apps \ --version v1 \ --kind WebApp # Create a second kind in the same project kubebuilder create api \ --group apps \ --version v1 \ --kind WebAppPolicy # Scaffolds CRD only, no controller kubebuilder create api \ --group apps \ --version v1 \ --kind Database \ --controller=false # View generated type cat api/v1/webapp_types.go
kubebuilder create webhook - Add defaulting or validation webhook
Syntax: kubebuilder create webhook --group <group> --version <version> --kind <Kind>
Description: Scaffolds a webhook server for the given kind, supporting defaulting (mutating) and validation (validating) webhooks.
--defaultinggenerates aDefault()method (MutatingAdmissionWebhook)--programmatic-validationgenerates aValidateCreate/Update/Delete()method (ValidatingAdmissionWebhook)-
Also generates certificate management setup in
config/webhook/
make generate - Generate DeepCopy functions
Syntax: make generate
Description: Runs controller-gen object to auto-generate DeepCopyObject() methods for all types. These are required by the Kubernetes runtime and must be regenerated after every change to *_types.go.
```bash
# Regenerate after changing types
make generate
# View generated file
cat api/v1/zz_generated.deepcopy.go
# What it runs under the hood:
# controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
```
make manifests - Generate CRD and RBAC manifests
Syntax: make manifests
Description: Runs controller-gen to generate CRD YAML, RBAC ClusterRole, and webhook manifests from Go markers. Must be run after every change to markers in *_types.go or *_controller.go.
```bash
# Generate all manifests
make manifests
# View generated CRD YAML
cat config/crd/bases/apps.codewizard.io_webapps.yaml
# View generated RBAC
cat config/rbac/role.yaml
# What it runs:
# controller-gen rbac:roleName=manager-role crd webhook paths="./..." \
# output:crd:artifacts:config=config/crd/bases
```
make install - Install CRDs into the cluster
Syntax: make install
Description: Applies the generated CRD manifests to the currently active cluster using kubectl apply. After this, kubectl get webapps will work.
```bash
# Install CRDs
make install
# Verify CRD is registered
kubectl get crds | grep codewizard
# Describe the CRD schema
kubectl describe crd webapps.apps.codewizard.io
# What it runs:
# kubectl apply -k config/crd
```
make run - Run the controller locally
Syntax: make run
Description: Runs the controller binary on your local machine using the kubeconfig in ~/.kube/config. The controller connects to the cluster and reconciles resources but runs outside the cluster (useful for development).
```bash
# Run controller locally (uses current kubeconfig)
make run
# Run with extra verbosity
make run ARGS="--zap-log-level=debug"
# Run with leader election disabled (single instance mode)
make run ARGS="--leader-elect=false"
```
make test - Run controller tests with envtest
Syntax: make test
Description: Runs the full test suite using envtest, which starts a real kube-apiserver and etcd binary locally - no cluster required.
```bash
# Run all tests
make test
# Run with verbose output
make test ARGS="-v"
# Run only specific tests
make test ARGS="-run TestWebAppReconciler"
# Run with coverage report
make test-coverage
```
make docker-build - Build the operator image
Syntax: make docker-build IMG=<image:tag>
Description: Builds a multi-stage Docker image containing the compiled operator binary.
```bash
# Build with a custom image name
make docker-build IMG=ghcr.io/myorg/webapp-operator:v0.1.0
# Build and push in one step
make docker-build docker-push IMG=ghcr.io/myorg/webapp-operator:v0.1.0
# Build for multiple platforms (requires buildx)
make docker-buildx IMG=ghcr.io/myorg/webapp-operator:v0.1.0
```
make deploy - Deploy the operator to the cluster
Syntax: make deploy IMG=<image:tag>
Description: Uses Kustomize to render all manifests (CRD, RBAC, Deployment, webhook certs) and applies them to the cluster.
```bash
# Deploy with a specific image
make deploy IMG=ghcr.io/myorg/webapp-operator:v0.1.0
# Verify the operator pod is running
kubectl get pods -n webapp-system
# Check operator logs
kubectl logs -n webapp-system -l control-plane=controller-manager -f
# Undeploy
make undeploy
```
Lab¶
Part 01 - Prerequisites¶
01.01 Install Go¶
Kubebuilder requires Go 1.21+.
Verify¶
01.02 Install Kubebuilder¶
# Detect OS and architecture
OS=$(go env GOOS)
ARCH=$(go env GOARCH)
# Download the latest kubebuilder binary
curl -L "https://go.kubebuilder.io/dl/latest/${OS}/${ARCH}" \
-o /tmp/kubebuilder
sudo mv /tmp/kubebuilder /usr/local/bin/kubebuilder
sudo chmod +x /usr/local/bin/kubebuilder
Verify¶
01.03 Install controller-gen and other tools¶
Kubebuilder uses several helper binaries installed by make. They are downloaded automatically on first use:
# These are auto-downloaded by the Makefile when needed:
# - controller-gen (code + manifest generation)
# - envtest (testing framework binaries)
# - kustomize (manifest composition)
# - golangci-lint (linter)
# You can pre-download them:
make controller-gen
make kustomize
01.04 Verify cluster access¶
Part 02 - Initialize the Project¶
We will build a WebApp Operator that manages a WebApp custom resource.
A WebApp CR creates a Deployment (nginx), a Service (ClusterIP), and a ConfigMap (HTML content) - all owned and reconciled by our controller.
02.01 Create and enter the project directory¶
02.02 Initialize the Kubebuilder project¶
Output:
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/controller-runtime@v0.18.x
go: downloading sigs.k8s.io/controller-runtime v0.18.x
...
Next: define a resource with:
$ kubebuilder create api
02.03 Inspect the scaffolded files¶
# Project layout
find . -type f | grep -v '.git\|vendor\|_test' | sort
# Go module
cat go.mod
# Entrypoint
cat cmd/main.go
# Makefile targets
make help
The cmd/main.go sets up the manager - it starts the controller, serves metrics, and manages leader election. You rarely need to edit this file by hand.
Part 03 - Create the API (CRD + Controller Scaffold)¶
03.01 Scaffold the WebApp API¶
When prompted:
03.02 Inspect the generated files¶
# Type definition (we will fill this in next)
cat api/v1/webapp_types.go
# Reconciler stub (we will fill this in next)
cat internal/controller/webapp_controller.go
# main.go is updated to register WebApp
grep WebApp cmd/main.go
Part 04 - Define the CRD Types¶
This is the heart of the API definition. Open api/v1/webapp_types.go and replace its contents with the following:
// api/v1/webapp_types.go
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// WebAppSpec defines the desired state of WebApp.
type WebAppSpec struct {
// Replicas is the desired number of nginx Pods.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=10
// +kubebuilder:default=1
Replicas int32 `json:"replicas,omitempty"`
// Image is the nginx container image (repository:tag).
// +kubebuilder:default="nginx:1.25.3"
// +kubebuilder:validation:MinLength=1
Image string `json:"image,omitempty"`
// Message is the HTML body text served by nginx.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=500
Message string `json:"message"`
// Port is the container port nginx listens on.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
// +kubebuilder:default=80
Port int32 `json:"port,omitempty"`
// ServiceType controls how the Service is exposed.
// +kubebuilder:validation:Enum=ClusterIP;NodePort;LoadBalancer
// +kubebuilder:default=ClusterIP
ServiceType string `json:"serviceType,omitempty"`
}
// WebAppPhase is a simple enum for the overall lifecycle state.
// +kubebuilder:validation:Enum=Pending;Running;Degraded;Failed
type WebAppPhase string
const (
WebAppPhasePending WebAppPhase = "Pending"
WebAppPhaseRunning WebAppPhase = "Running"
WebAppPhaseDegraded WebAppPhase = "Degraded"
WebAppPhaseFailed WebAppPhase = "Failed"
)
// WebAppStatus defines the observed state of WebApp.
type WebAppStatus struct {
// AvailableReplicas is the number of Pods in the Ready state.
AvailableReplicas int32 `json:"availableReplicas,omitempty"`
// ReadyReplicas is the number of Pods that have passed readiness checks.
ReadyReplicas int32 `json:"readyReplicas,omitempty"`
// Phase is a high-level summary of the WebApp lifecycle.
Phase WebAppPhase `json:"phase,omitempty"`
// DeploymentName is the name of the managed Deployment.
DeploymentName string `json:"deploymentName,omitempty"`
// ServiceName is the name of the managed Service.
ServiceName string `json:"serviceName,omitempty"`
// Conditions holds standard API conditions.
// +listType=map
// +listMapKey=type
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
// Condition type constants
const (
// ConditionTypeAvailable means the WebApp has at least one ready pod.
ConditionTypeAvailable = "Available"
// ConditionTypeProgressing means a rollout or scale is in progress.
ConditionTypeProgressing = "Progressing"
// ConditionTypeDegraded means some (but not all) replicas are ready.
ConditionTypeDegraded = "Degraded"
)
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:resource:shortName=wa,categories=all
//+kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=".spec.replicas"
//+kubebuilder:printcolumn:name="Available",type=integer,JSONPath=".status.availableReplicas"
//+kubebuilder:printcolumn:name="Phase",type=string,JSONPath=".status.phase"
//+kubebuilder:printcolumn:name="Image",type=string,JSONPath=".spec.image"
//+kubebuilder:printcolumn:name="Age",type=date,JSONPath=".metadata.creationTimestamp"
// WebApp is the Schema for the webapps API.
// It provisions a Deployment, Service, and ConfigMap that serve the configured HTML page.
type WebApp struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec WebAppSpec `json:"spec,omitempty"`
Status WebAppStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// WebAppList contains a list of WebApp.
type WebAppList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []WebApp `json:"items"`
}
func init() {
SchemeBuilder.Register(&WebApp{}, &WebAppList{})
}
Marker Reference¶
| Marker | Effect |
|---|---|
//+kubebuilder:object:root=true |
Marks this type as a root object (has its own API endpoint) |
//+kubebuilder:subresource:status |
Generates a /status sub-resource (status updates don’t trigger spec watches) |
//+kubebuilder:resource:shortName=wa |
Allows kubectl get wa as a shorthand |
//+kubebuilder:printcolumn:... |
Extra columns shown by kubectl get wa |
//+kubebuilder:validation:Minimum=1 |
Adds server-side validation to the CRD schema |
//+kubebuilder:default=1 |
Sets a default value when the field is omitted |
//+kubebuilder:validation:Enum=... |
Restricts the field to a fixed set of values |
04.01 Generate DeepCopy functions¶
After every change to *_types.go run:
This auto-generates api/v1/zz_generated.deepcopy.go which implements DeepCopyObject() - required by the Kubernetes runtime for garbage collection and caching.
04.02 Generate CRD manifest¶
Inspect the generated CRD YAML:
You will see the full OpenAPI v3 schema, validation rules, printer columns, and status subresource settings - all derived from the Go markers.
Part 05 - Implement the Reconciler¶
Open internal/controller/webapp_controller.go and replace its contents with the full reconciler:
// internal/controller/webapp_controller.go
package controller
import (
"context"
"fmt"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
webappv1 "codewizard.io/webapp-operator/api/v1"
)
// WebAppReconciler reconciles a WebApp object.
type WebAppReconciler struct {
client.Client
Scheme *runtime.Scheme
}
// RBAC markers - these generate config/rbac/role.yaml when `make manifests` is run.
//
//+kubebuilder:rbac:groups=apps.codewizard.io,resources=webapps,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=apps.codewizard.io,resources=webapps/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=apps.codewizard.io,resources=webapps/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=configmaps,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
// Reconcile is the main reconciliation loop.
// It is called whenever a WebApp CR (or a resource it owns) changes.
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// ββ Step 1: Fetch the WebApp instance ββββββββββββββββββββββββββββββββββββ
webapp := &webappv1.WebApp{}
if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
if errors.IsNotFound(err) {
// Object was deleted before we could reconcile - nothing to do.
logger.Info("WebApp not found, likely deleted", "name", req.Name)
return ctrl.Result{}, nil
}
return ctrl.Result{}, fmt.Errorf("fetching WebApp: %w", err)
}
logger.Info("Reconciling WebApp",
"name", webapp.Name,
"namespace", webapp.Namespace,
"replicas", webapp.Spec.Replicas)
// ββ Step 2: Reconcile ConfigMap (HTML content) βββββββββββββββββββββββββββ
if err := r.reconcileConfigMap(ctx, webapp); err != nil {
return ctrl.Result{}, fmt.Errorf("reconciling ConfigMap: %w", err)
}
// ββ Step 3: Reconcile Deployment βββββββββββββββββββββββββββββββββββββββββ
deployment, err := r.reconcileDeployment(ctx, webapp)
if err != nil {
return ctrl.Result{}, fmt.Errorf("reconciling Deployment: %w", err)
}
// ββ Step 4: Reconcile Service βββββββββββββββββββββββββββββββββββββββββββββ
if err := r.reconcileService(ctx, webapp); err != nil {
return ctrl.Result{}, fmt.Errorf("reconciling Service: %w", err)
}
// ββ Step 5: Update Status βββββββββββββββββββββββββββββββββββββββββββββββββ
if err := r.updateStatus(ctx, webapp, deployment); err != nil {
return ctrl.Result{}, fmt.Errorf("updating status: %w", err)
}
return ctrl.Result{}, nil
}
// ββ reconcileConfigMap ensures the HTML ConfigMap exists and is up-to-date. ββ
func (r *WebAppReconciler) reconcileConfigMap(ctx context.Context, webapp *webappv1.WebApp) error {
logger := log.FromContext(ctx)
desired := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: webapp.Name + "-html",
Namespace: webapp.Namespace,
Labels: labelsForWebApp(webapp.Name),
},
Data: map[string]string{
"index.html": fmt.Sprintf(`<!DOCTYPE html>
<html>
<head><title>%s</title></head>
<body>
<h1>%s</h1>
<p>Managed by the <strong>WebApp Operator</strong> | Instance: %s</p>
</body>
</html>`, webapp.Spec.Message, webapp.Spec.Message, webapp.Name),
},
}
// Set the WebApp as the owner of the ConfigMap.
// When the WebApp CR is deleted, Kubernetes garbage-collects the ConfigMap automatically.
if err := ctrl.SetControllerReference(webapp, desired, r.Scheme); err != nil {
return err
}
// Fetch the existing ConfigMap
existing := &corev1.ConfigMap{}
err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, existing)
if errors.IsNotFound(err) {
logger.Info("Creating ConfigMap", "name", desired.Name)
return r.Create(ctx, desired)
}
if err != nil {
return err
}
// Update if the content has changed
if existing.Data["index.html"] != desired.Data["index.html"] {
existing.Data = desired.Data
logger.Info("Updating ConfigMap", "name", existing.Name)
return r.Update(ctx, existing)
}
return nil
}
// ββ reconcileDeployment ensures the nginx Deployment exists and matches spec. ββ
func (r *WebAppReconciler) reconcileDeployment(ctx context.Context, webapp *webappv1.WebApp) (*appsv1.Deployment, error) {
logger := log.FromContext(ctx)
labels := labelsForWebApp(webapp.Name)
replicas := webapp.Spec.Replicas
desired := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: webapp.Name,
Namespace: webapp.Namespace,
Labels: labels,
},
Spec: appsv1.DeploymentSpec{
Replicas: &replicas,
Selector: &metav1.LabelSelector{MatchLabels: labels},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{Labels: labels},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "nginx",
Image: webapp.Spec.Image,
ImagePullPolicy: corev1.PullIfNotPresent,
Ports: []corev1.ContainerPort{
{ContainerPort: webapp.Spec.Port, Protocol: corev1.ProtocolTCP},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "html",
MountPath: "/usr/share/nginx/html",
},
},
ReadinessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
HTTPGet: &corev1.HTTPGetAction{
Path: "/",
Port: intOrString(webapp.Spec.Port),
},
},
InitialDelaySeconds: 5,
PeriodSeconds: 10,
},
LivenessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
HTTPGet: &corev1.HTTPGetAction{
Path: "/",
Port: intOrString(webapp.Spec.Port),
},
},
InitialDelaySeconds: 15,
PeriodSeconds: 20,
},
},
},
Volumes: []corev1.Volume{
{
Name: "html",
VolumeSource: corev1.VolumeSource{
ConfigMap: &corev1.ConfigMapVolumeSource{
LocalObjectReference: corev1.LocalObjectReference{
Name: webapp.Name + "-html",
},
},
},
},
},
},
},
},
}
if err := ctrl.SetControllerReference(webapp, desired, r.Scheme); err != nil {
return nil, err
}
existing := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, existing)
if errors.IsNotFound(err) {
logger.Info("Creating Deployment", "name", desired.Name)
if err := r.Create(ctx, desired); err != nil {
return nil, err
}
return desired, nil
}
if err != nil {
return nil, err
}
// Reconcile mutable fields: replicas and image
needsUpdate := false
if *existing.Spec.Replicas != replicas {
existing.Spec.Replicas = &replicas
needsUpdate = true
}
if existing.Spec.Template.Spec.Containers[0].Image != webapp.Spec.Image {
existing.Spec.Template.Spec.Containers[0].Image = webapp.Spec.Image
needsUpdate = true
}
if needsUpdate {
logger.Info("Updating Deployment", "name", existing.Name,
"replicas", replicas, "image", webapp.Spec.Image)
if err := r.Update(ctx, existing); err != nil {
return nil, err
}
}
return existing, nil
}
// ββ reconcileService ensures the Service exists and matches spec. ββββββββββββββ
func (r *WebAppReconciler) reconcileService(ctx context.Context, webapp *webappv1.WebApp) error {
logger := log.FromContext(ctx)
labels := labelsForWebApp(webapp.Name)
svcType := corev1.ServiceType(webapp.Spec.ServiceType)
desired := &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: webapp.Name,
Namespace: webapp.Namespace,
Labels: labels,
},
Spec: corev1.ServiceSpec{
Selector: labels,
Type: svcType,
Ports: []corev1.ServicePort{
{
Port: webapp.Spec.Port,
TargetPort: intOrString(webapp.Spec.Port),
Protocol: corev1.ProtocolTCP,
},
},
},
}
if err := ctrl.SetControllerReference(webapp, desired, r.Scheme); err != nil {
return err
}
existing := &corev1.Service{}
err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, existing)
if errors.IsNotFound(err) {
logger.Info("Creating Service", "name", desired.Name)
return r.Create(ctx, desired)
}
if err != nil {
return err
}
// Reconcile Service type (immutable field - recreate required)
if existing.Spec.Type != svcType {
logger.Info("Recreating Service due to type change", "old", existing.Spec.Type, "new", svcType)
if err := r.Delete(ctx, existing); err != nil {
return err
}
return r.Create(ctx, desired)
}
return nil
}
// ββ updateStatus computes and persists the WebApp status. βββββββββββββββββββββββ
func (r *WebAppReconciler) updateStatus(ctx context.Context, webapp *webappv1.WebApp, deployment *appsv1.Deployment) error {
// Work on a copy to avoid mutating the cached object
updated := webapp.DeepCopy()
available := deployment.Status.AvailableReplicas
ready := deployment.Status.ReadyReplicas
updated.Status.AvailableReplicas = available
updated.Status.ReadyReplicas = ready
updated.Status.DeploymentName = deployment.Name
updated.Status.ServiceName = webapp.Name
// Compute phase
switch {
case available == 0:
updated.Status.Phase = webappv1.WebAppPhasePending
case ready < webapp.Spec.Replicas:
updated.Status.Phase = webappv1.WebAppPhaseDegraded
default:
updated.Status.Phase = webappv1.WebAppPhaseRunning
}
// Set the Available condition
availableCond := metav1.Condition{
Type: webappv1.ConditionTypeAvailable,
ObservedGeneration: webapp.Generation,
LastTransitionTime: metav1.Now(),
}
if available >= webapp.Spec.Replicas {
availableCond.Status = metav1.ConditionTrue
availableCond.Reason = "DeploymentAvailable"
availableCond.Message = fmt.Sprintf("%d/%d replicas are available", available, webapp.Spec.Replicas)
} else {
availableCond.Status = metav1.ConditionFalse
availableCond.Reason = "DeploymentUnavailable"
availableCond.Message = fmt.Sprintf("only %d/%d replicas are available", available, webapp.Spec.Replicas)
}
meta.SetStatusCondition(&updated.Status.Conditions, availableCond)
// Only issue an update if anything changed
if updated.Status.Phase != webapp.Status.Phase ||
updated.Status.AvailableReplicas != webapp.Status.AvailableReplicas {
return r.Status().Update(ctx, updated)
}
return nil
}
// ββ SetupWithManager wires the controller into the manager. βββββββββββββββββββββ
func (r *WebAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
// Primary watch: WebApp CRs
For(&webappv1.WebApp{}).
// Secondary watches: owned resources - any change triggers reconciliation
Owns(&appsv1.Deployment{}).
Owns(&corev1.Service{}).
Owns(&corev1.ConfigMap{}).
Complete(r)
}
// ββ Helpers βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
func labelsForWebApp(name string) map[string]string {
return map[string]string{
"app.kubernetes.io/name": "webapp",
"app.kubernetes.io/instance": name,
"app.kubernetes.io/managed-by": "webapp-operator",
}
}
func intOrString(port int32) intstr.IntOrString {
return intstr.FromInt32(port)
}
Add the missing import
The intOrString helper uses k8s.io/apimachinery/pkg/util/intstr. Add it to the import block:
Understanding the Reconcile Loop¶
flowchart TD
trigger["Event: WebApp changed\nor Deployment / Service / ConfigMap changed"]
-->fetch["r.Get(ctx, req, &webapp)\nFetch current desired state"]
-->notfound{Not found?}
notfound -->|yes| done["return - resource deleted, nothing to do"]
notfound -->|no| cm["reconcileConfigMap()\nCreate or update HTML ConfigMap"]
-->dep["reconcileDeployment()\nCreate or update nginx Deployment"]
-->svc["reconcileService()\nCreate or update Service"]
-->status["updateStatus()\nCompute phase + conditions\nr.Status().Update()"]
-->requeue["return ctrl.Result{}"]
Key controller-runtime Concepts¶
| Concept | Code | Explanation |
|---|---|---|
| Fetch CR | r.Get(ctx, req.NamespacedName, webapp) |
Always read fresh from API server |
| IsNotFound | errors.IsNotFound(err) |
Distinguish “doesn’t exist” from real errors |
| Owner reference | ctrl.SetControllerReference(webapp, child, r.Scheme) |
Garbage-collect child when parent deleted |
| Status update | r.Status().Update(ctx, updated) |
Use the status subresource, not r.Update |
| Watch child | .Owns(&appsv1.Deployment{}) |
Re-queue parent when child changes |
| Logger | log.FromContext(ctx) |
Context-scoped structured logger |
Part 06 - Install CRDs and Run Locally¶
06.01 Install CRDs into the cluster¶
# Generate manifests from markers
make manifests
# Apply CRDs to the cluster
make install
# Verify
kubectl get crds | grep codewizard
kubectl describe crd webapps.apps.codewizard.io | grep -A 20 "COLUMNS\|Validation"
06.02 Verify the short name works¶
# The //+kubebuilder:resource:shortName=wa marker enables this
kubectl get wa
# No resources found in default namespace.
06.03 Run the controller locally¶
You will see logs like:
INFO Starting manager
INFO Starting Controller {"controller": "webapp"}
INFO Starting workers {"controller": "webapp", "worker count": 1}
Leave this running in one terminal and open a second terminal for the next steps.
Part 07 - Create Your First WebApp CR¶
07.01 Apply the sample CR¶
Create config/samples/apps_v1_webapp.yaml:
apiVersion: apps.codewizard.io/v1
kind: WebApp
metadata:
name: my-webapp
namespace: default
spec:
replicas: 2
image: nginx:1.25.3
message: "Hello from the WebApp Operator!"
port: 80
serviceType: ClusterIP
07.02 Watch the controller create child resources¶
In your second terminal:
# Watch pods appear
kubectl get pods -l app.kubernetes.io/name=webapp -w
# Check all resources created by the operator
kubectl get deployment,service,configmap -l app.kubernetes.io/managed-by=webapp-operator
Expected output:
NAME READY UP-TO-DATE AVAILABLE
deployment.apps/my-webapp 2/2 2 2
NAME TYPE CLUSTER-IP PORT(S)
service/my-webapp ClusterIP 10.96.x.x 80/TCP
NAME DATA
configmap/my-webapp-html 1
07.03 Inspect the WebApp status¶
kubectl get wa
# Output (notice the printer columns from our markers):
# NAME REPLICAS AVAILABLE PHASE IMAGE AGE
# my-webapp 2 2 Running nginx:1.25.3 30s
# Full status
kubectl get wa my-webapp -o jsonpath='{.status}' | jq .
# Check conditions
kubectl get wa my-webapp -o jsonpath='{.status.conditions}' | jq .
07.04 Test the application¶
Expected:
<!DOCTYPE html>
<html>
<head><title>Hello from the WebApp Operator!</title></head>
<body>
<h1>Hello from the WebApp Operator!</h1>
<p>Managed by the <strong>WebApp Operator</strong> | Instance: my-webapp</p>
</body>
</html>
Part 08 - Self-Healing: The Reconciler Restores Deleted Resources¶
This is one of the most powerful operator features: if someone manually deletes or modifies a child resource, the operator recreates it immediately.
08.01 Delete the Deployment manually¶
08.02 Watch the operator restore it¶
# In the `make run` terminal you will see:
# INFO Reconciling WebApp {"name": "my-webapp"}
# INFO Creating Deployment {"name": "my-webapp"}
kubectl get deployment my-webapp
# NAME READY UP-TO-DATE AVAILABLE AGE
# my-webapp 2/2 2 2 5s
The deployment was recreated in seconds by the reconcile loop.
08.03 Why does this work?¶
When .Owns(&appsv1.Deployment{}) is set in SetupWithManager, controller-runtime watches all Deployments that have an owner reference pointing to a WebApp. Any change (including deletion) enqueues the parent WebApp for reconciliation.
Part 09 - Update the CR and Observe Reconciliation¶
09.01 Scale up to 4 replicas¶
09.02 Watch the Deployment scale¶
kubectl get pods -l app.kubernetes.io/name=webapp -w
# NAME READY STATUS ...
# my-webapp-xxx 1/1 Running
# my-webapp-yyy 1/1 Running
# my-webapp-zzz 1/1 Running β new
# my-webapp-aaa 1/1 Running β new
kubectl get wa my-webapp
# NAME REPLICAS AVAILABLE PHASE ...
# my-webapp 4 4 Running
09.03 Update the message¶
kubectl port-forward svc/my-webapp 8080:80 &
sleep 2
curl http://localhost:8080 | grep "Updated"
kill %1
09.04 Update the image¶
kubectl patch wa my-webapp --type=merge \
-p '{"spec":{"image":"nginx:1.26.0"}}'
kubectl get deployment my-webapp -o jsonpath='{.spec.template.spec.containers[0].image}'
# nginx:1.26.0
Part 10 - Add a Finalizer for Cleanup Logic¶
Finalizers let you run custom cleanup code before the resource is actually deleted from etcd.
10.01 Add the finalizer constant¶
In internal/controller/webapp_controller.go add:
10.02 Add finalizer handling to Reconcile¶
Insert this block at the beginning of the Reconcile() function, after fetching the WebApp:
// ββ Finalizer handling ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
if webapp.DeletionTimestamp.IsZero() {
// Not being deleted - ensure finalizer is present
if !controllerutil.ContainsFinalizer(webapp, webappFinalizer) {
controllerutil.AddFinalizer(webapp, webappFinalizer)
if err := r.Update(ctx, webapp); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, nil // re-queue after update
}
} else {
// Being deleted - run cleanup before allowing deletion
if controllerutil.ContainsFinalizer(webapp, webappFinalizer) {
logger.Info("Running finalizer cleanup", "name", webapp.Name)
// (perform any external cleanup here, e.g. cloud resources, DNS records)
// Remove finalizer - Kubernetes will then delete the object
controllerutil.RemoveFinalizer(webapp, webappFinalizer)
if err := r.Update(ctx, webapp); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
Add "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil" to imports.
10.03 Test the finalizer¶
# Delete the WebApp
kubectl delete wa my-webapp
# Before the finalizer is removed, the object shows DeletionTimestamp
kubectl get wa my-webapp -o jsonpath='{.metadata.deletionTimestamp}'
# After cleanup the object and all owned resources disappear
kubectl get deployment,service,configmap -l app.kubernetes.io/managed-by=webapp-operator
# No resources found.
Part 11 - Add a Validation Webhook¶
Webhooks intercept API calls to validate or mutate resources before they are persisted to etcd.
11.01 Scaffold the webhook¶
kubebuilder create webhook \
--group apps \
--version v1 \
--kind WebApp \
--defaulting \
--programmatic-validation
This creates api/v1/webapp_webhook.go.
11.02 Implement defaulting (mutating webhook)¶
In api/v1/webapp_webhook.go, implement the Default() method:
func (r *WebApp) Default() {
log := logf.Log.WithName("webapp-resource")
log.Info("Applying defaults", "name", r.Name)
// Set default image if not provided
if r.Spec.Image == "" {
r.Spec.Image = "nginx:1.25.3"
}
// Set default replicas
if r.Spec.Replicas == 0 {
r.Spec.Replicas = 1
}
// Set default port
if r.Spec.Port == 0 {
r.Spec.Port = 80
}
// Set default service type
if r.Spec.ServiceType == "" {
r.Spec.ServiceType = "ClusterIP"
}
}
11.03 Implement validation (validating webhook)¶
func (r *WebApp) ValidateCreate() (admission.Warnings, error) {
return r.validateWebApp()
}
func (r *WebApp) ValidateUpdate(old runtime.Object) (admission.Warnings, error) {
return r.validateWebApp()
}
func (r *WebApp) ValidateDelete() (admission.Warnings, error) {
return nil, nil
}
func (r *WebApp) validateWebApp() (admission.Warnings, error) {
var errs field.ErrorList
// Replicas must be between 1 and 10
if r.Spec.Replicas < 1 || r.Spec.Replicas > 10 {
errs = append(errs, field.Invalid(
field.NewPath("spec", "replicas"),
r.Spec.Replicas,
"must be between 1 and 10",
))
}
// Message must not be empty
if r.Spec.Message == "" {
errs = append(errs, field.Required(
field.NewPath("spec", "message"),
"message is required and cannot be empty",
))
}
if len(errs) > 0 {
return nil, apierrors.NewInvalid(
schema.GroupKind{Group: "apps.codewizard.io", Kind: "WebApp"},
r.Name,
errs,
)
}
return nil, nil
}
11.04 Test webhook validation¶
# This should fail - replicas = 15 exceeds maximum of 10
kubectl apply -f - <<EOF
apiVersion: apps.codewizard.io/v1
kind: WebApp
metadata:
name: invalid-webapp
spec:
replicas: 15
message: "test"
image: nginx:1.25.3
port: 80
EOF
# Expected error:
# Error from server (WebApp.apps.codewizard.io "invalid-webapp" is invalid):
# spec.replicas: Invalid value: 15: must be between 1 and 10
Part 12 - Writing Controller Tests¶
Kubebuilder sets up envtest which runs a real kube-apiserver and etcd - no cluster needed.
12.01 Inspect the test suite setup¶
12.02 Write a reconciler integration test¶
Create internal/controller/webapp_controller_test.go:
package controller
import (
"context"
"time"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/types"
webappv1 "codewizard.io/webapp-operator/api/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var _ = Describe("WebApp Controller", func() {
const (
WebAppName = "test-webapp"
WebAppNamespace = "default"
timeout = time.Second * 30
interval = time.Millisecond * 250
)
ctx := context.Background()
Context("When creating a WebApp", func() {
It("should create a Deployment, Service and ConfigMap", func() {
// ββ Create the WebApp CR ββββββββββββββββββββββββββββββββββββββββ
webapp := &webappv1.WebApp{
ObjectMeta: metav1.ObjectMeta{
Name: WebAppName,
Namespace: WebAppNamespace,
},
Spec: webappv1.WebAppSpec{
Replicas: 2,
Image: "nginx:1.25.3",
Message: "Hello from test",
Port: 80,
ServiceType: "ClusterIP",
},
}
Expect(k8sClient.Create(ctx, webapp)).To(Succeed())
// ββ Assert: Deployment is created ββββββββββββββββββββββββββββββββ
deploymentLookup := types.NamespacedName{Name: WebAppName, Namespace: WebAppNamespace}
createdDeployment := &appsv1.Deployment{}
Eventually(func() error {
return k8sClient.Get(ctx, deploymentLookup, createdDeployment)
}, timeout, interval).Should(Succeed())
Expect(*createdDeployment.Spec.Replicas).To(Equal(int32(2)))
Expect(createdDeployment.Spec.Template.Spec.Containers[0].Image).To(Equal("nginx:1.25.3"))
// ββ Assert: Service is created βββββββββββββββββββββββββββββββββββ
createdService := &corev1.Service{}
Eventually(func() error {
return k8sClient.Get(ctx, deploymentLookup, createdService)
}, timeout, interval).Should(Succeed())
Expect(createdService.Spec.Type).To(Equal(corev1.ServiceTypeClusterIP))
// ββ Assert: ConfigMap is created βββββββββββββββββββββββββββββββββ
cmLookup := types.NamespacedName{Name: WebAppName + "-html", Namespace: WebAppNamespace}
createdCM := &corev1.ConfigMap{}
Eventually(func() error {
return k8sClient.Get(ctx, cmLookup, createdCM)
}, timeout, interval).Should(Succeed())
Expect(createdCM.Data["index.html"]).To(ContainSubstring("Hello from test"))
})
It("should update the Deployment when replicas change", func() {
// Patch replicas from 2 β 4
webapp := &webappv1.WebApp{}
Expect(k8sClient.Get(ctx, types.NamespacedName{
Name: WebAppName,
Namespace: WebAppNamespace,
}, webapp)).To(Succeed())
webapp.Spec.Replicas = 4
Expect(k8sClient.Update(ctx, webapp)).To(Succeed())
// Assert Deployment reflects new replica count
Eventually(func() int32 {
dep := &appsv1.Deployment{}
_ = k8sClient.Get(ctx, types.NamespacedName{
Name: WebAppName,
Namespace: WebAppNamespace,
}, dep)
return *dep.Spec.Replicas
}, timeout, interval).Should(Equal(int32(4)))
})
It("should update the ConfigMap when message changes", func() {
webapp := &webappv1.WebApp{}
Expect(k8sClient.Get(ctx, types.NamespacedName{
Name: WebAppName,
Namespace: WebAppNamespace,
}, webapp)).To(Succeed())
webapp.Spec.Message = "Updated message via test"
Expect(k8sClient.Update(ctx, webapp)).To(Succeed())
Eventually(func() string {
cm := &corev1.ConfigMap{}
_ = k8sClient.Get(ctx, types.NamespacedName{
Name: WebAppName + "-html",
Namespace: WebAppNamespace,
}, cm)
return cm.Data["index.html"]
}, timeout, interval).Should(ContainSubstring("Updated message via test"))
})
})
})
12.03 Run the tests¶
make test
# With verbose output
make test ARGS="--v"
# Run only specific tests
go test ./internal/controller/... -run "WebApp Controller" -v
Part 13 - Build and Deploy the Operator In-Cluster¶
13.01 Write the Dockerfile¶
The scaffolded Dockerfile uses a multi-stage build:
# Build stage
FROM golang:1.22 AS builder
ARG TARGETOS
ARG TARGETARCH
WORKDIR /workspace
COPY go.mod go.sum ./
RUN go mod download
COPY cmd/ cmd/
COPY api/ api/
COPY internal/ internal/
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} \
go build -a -o manager cmd/main.go
# Runtime stage - distroless for minimal attack surface
FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
USER 65532:65532
ENTRYPOINT ["/manager"]
13.02 Build and push the image¶
# Set your image registry
export IMG=ghcr.io/your-org/webapp-operator:v0.1.0
# Build the image
make docker-build IMG=${IMG}
# Push (requires docker login)
make docker-push IMG=${IMG}
13.03 Deploy to the cluster¶
# Deploys CRDs, RBAC, and the operator Deployment via Kustomize
make deploy IMG=${IMG}
# Verify the operator pod is running
kubectl get pods -n webapp-system
# Watch the operator logs
kubectl logs -n webapp-system \
-l control-plane=controller-manager \
-f
13.04 Apply a WebApp CR in the cluster¶
kubectl apply -f config/samples/apps_v1_webapp.yaml
kubectl get wa -A
kubectl get deployment,service,configmap -l app.kubernetes.io/managed-by=webapp-operator
13.05 Undeploy¶
Part 14 - Quick Reference Cheatsheet¶
| Goal | Command |
|---|---|
| Init new project | kubebuilder init --domain codewizard.io --repo codewizard.io/my-op |
| Create CRD + controller | kubebuilder create api --group apps --version v1 --kind MyKind |
| Create webhook | kubebuilder create webhook --group apps --version v1 --kind MyKind --defaulting --programmatic-validation |
| Regenerate DeepCopy | make generate |
| Regenerate CRD/RBAC YAML | make manifests |
| Install CRDs to cluster | make install |
| Run controller locally | make run |
| Run tests (no cluster) | make test |
| Build operator image | make docker-build IMG=myregistry/myop:v1 |
| Push operator image | make docker-push IMG=myregistry/myop:v1 |
| Deploy to cluster | make deploy IMG=myregistry/myop:v1 |
| Undeploy | make undeploy |
| Remove CRDs | make uninstall |
| View all API resources | kubectl api-resources --api-group=apps.codewizard.io |
| Short-name get | kubectl get wa |
| Watch reconcile logs | kubectl logs -n webapp-system -l control-plane=controller-manager -f |
Exercises¶
The following exercises build on the webapp-operator created in this lab.
Exercise 01 - Add a maxUnavailable field¶
Add a MaxUnavailable field to WebAppSpec that maps to deployment.spec.strategy.rollingUpdate.maxUnavailable.
- Add the field with validation
Minimum=0,Maximum=replicas - Default it to
1 - Implement the mapping in
reconcileDeployment() - Run
make generate && make manifests - Test with a patch:
kubectl patch wa my-webapp --type=merge -p '{"spec":{"maxUnavailable":2}}'
Hint
// In WebAppSpec
// +kubebuilder:validation:Minimum=0
// +kubebuilder:default=1
MaxUnavailable int32 `json:"maxUnavailable,omitempty"`
// In reconcileDeployment, set:
intMaxUnavailable := intstr.FromInt32(webapp.Spec.MaxUnavailable)
desired.Spec.Strategy = appsv1.DeploymentStrategy{
Type: appsv1.RollingUpdateDeploymentStrategyType,
RollingUpdate: &appsv1.RollingUpdateDeployment{
MaxUnavailable: &intMaxUnavailable,
},
}
Exercise 02 - Surface a URL in the status¶
Add a URL field to WebAppStatus that the controller populates with http://<service-cluster-ip>:<port>.
- Add
URL stringtoWebAppStatus - In
updateStatus(), fetch the Service’sClusterIPand populateupdated.Status.URL - Verify:
kubectl get wa my-webapp -o jsonpath='{.status.url}'
Hint
Exercise 03 - Add a Paused field to skip reconciliation¶
Add a Paused bool field to WebAppSpec. When true, the controller exits the reconcile loop early with a log message, leaving all resources unchanged.
- Add
// +kubebuilder:default=falseandPaused booltoWebAppSpec - In
Reconcile(), check early:if webapp.Spec.Paused { logger.Info("Skipping - paused"); return ctrl.Result{}, nil } - Test:
kubectl patch wa my-webapp --type=merge -p '{"spec":{"paused":true}}'Then scale down manually and confirm the operator does not restore it.
Exercise 04 - Add a WebAppPolicy CRD¶
Create a second API kind WebAppPolicy in the same group that defines a maxReplicas namespace-scoped limit.
In the WebAppReconciler, after fetching the WebApp, look for a WebAppPolicy in the same namespace and enforce the maxReplicas limit by clamping webapp.Spec.Replicas.
Exercise 05 - Write a webhook that prevents downscaling to 0¶
Add a validating webhook that rejects any update to a WebApp where spec.replicas was changed from > 0 to 0.
func (r *WebApp) ValidateUpdate(old runtime.Object) (admission.Warnings, error) {
oldWebApp := old.(*WebApp)
if oldWebApp.Spec.Replicas > 0 && r.Spec.Replicas == 0 {
return nil, apierrors.NewForbidden(...)
}
return r.validateWebApp()
}
Cleanup¶
# Delete sample CRs
kubectl delete wa --all
# Uninstall CRDs from the cluster
make uninstall
# If deployed in-cluster
make undeploy
# Remove the project directory
cd ..
rm -rf webapp-operator
Troubleshooting¶
make generatefails with missing tool:
The Makefile auto-downloads controller-gen. If it fails, install manually:
- CRD not appearing after
make install:
Verify the CRD was generated and applied:
cat config/crd/bases/*.yaml | head -20
kubectl get crds | grep <your-domain>
kubectl describe crd <crd-name>
- Controller crashes with
cannot create resourceerrors:
The RBAC markers in your controller file may be incomplete. Ensure you have //+kubebuilder:rbac markers for every resource your controller touches:
# Check the generated RBAC role
cat config/rbac/role.yaml
# Regenerate after adding markers
make manifests
make deploy IMG=${IMG}
make testfails withenvtestbinary not found:
Download the envtest binaries:
- Status not updating on the CR:
Ensure you use r.Status().Update() (the status subresource), not r.Update():
- Owner reference / garbage collection not working:
Verify ctrl.SetControllerReference() is called for every child resource and that .Owns() is set in SetupWithManager:
Next Steps¶
- Explore the Kubebuilder Book for advanced patterns: multi-version APIs, conversion webhooks, and external event sources.
- Try the Operator SDK which builds on Kubebuilder with Ansible and Helm-based operators.
- Browse OperatorHub.io to see real-world operators and their patterns.
- Learn about Finalizers for managing external resources during deletion.
- Combine your operator with ArgoCD (Lab 18) for GitOps-managed operator deployments.
- Practice operator tasks in the Kubernetes Kubebuilder Tasks section.
Krew - kubectl Plugin Manager¶

- Welcome to the
Krewhands-on lab! In this tutorial, you’ll learn how to useKrew, the plugin manager forkubectl, to discover, install, and manage plugins that extend Kubernetes CLI capabilities. - You’ll install useful plugins, explore their functionality, and learn how to build your own
kubectlplugin.
What will we learn?¶
- What
Krewis and why it is useful - How to install and configure
Krew - How to discover, install, update, and remove
kubectlplugins - Essential
kubectlplugins for daily Kubernetes work - How to create your own
kubectlplugin - Troubleshooting and best practices
Official Documentation & References¶
| Resource | Link |
|---|---|
| Krew Official Site | krew.sigs.k8s.io |
| Krew User Guide | krew.sigs.k8s.io/docs/user-guide |
| Krew Plugin Index | krew.sigs.k8s.io/plugins |
| Krew GitHub Repository | github.com/kubernetes-sigs/krew |
| Writing Custom kubectl Plugins | kubernetes.io/docs/tasks/extend-kubectl |
| kubectl Plugin Discovery | kubernetes.io/docs/concepts/extend-kubectl |
Prerequisites¶
- A running Kubernetes cluster (minikube, kind, Docker Desktop, or cloud-managed)
kubectlinstalled and configured to communicate with your clustergitinstalled (for plugin installation from source)- Basic familiarity with the command line
Introduction¶
What is Krew?¶
Krewis a plugin manager forkubectl, the Kubernetes command-line tool.- It works similarly to
aptfor Debian,brewfor macOS, ornpmfor Node.js - but specifically forkubectlplugins. Krewhelps you discover, install, and manage plugins that extendkubectlwith additional commands and capabilities.
Why use Krew?¶
- Discoverability: Browse 200+ community-maintained plugins from a centralized index
- Easy installation: Install plugins with a single command (
kubectl krew install <plugin>) - Version management: Update all installed plugins at once
- Cross-platform: Works on Linux, macOS, and Windows
- No sudo required: Plugins are installed in your home directory
How kubectl plugins work¶
kubectlhas a built-in plugin mechanism: any executable in yourPATHnamedkubectl-<plugin_name>becomes akubectlsubcommand.- For example, an executable named
kubectl-whoamican be invoked askubectl whoami. Krewmanages the installation of these executables into~/.krew/bin/.
graph LR
A[kubectl krew install foo] --> B[Download plugin binary]
B --> C[Place in ~/.krew/bin/kubectl-foo]
C --> D[kubectl foo is now available]
style A fill:#326CE5,color:#fff
style D fill:#326CE5,color:#fff
Krew Architecture¶
~/.krew/
βββ bin/ # Plugin binaries (added to PATH)
β βββ kubectl-krew # Krew itself
β βββ kubectl-ctx # Context switcher plugin
β βββ kubectl-ns # Namespace switcher plugin
β βββ ...
βββ index/ # Plugin index (metadata)
β βββ default/
β βββ plugins/
βββ receipts/ # Installation records
β βββ ctx.yaml
β βββ ns.yaml
β βββ ...
βββ store/ # Downloaded plugin archives
Common Krew Commands¶
Below are the most common Krew commands you’ll use when working with kubectl plugins.
kubectl krew install - Install a plugin
Syntax: kubectl krew install <plugin-name>
Description: Downloads and installs a plugin from the Krew index.
- Downloads the plugin binary for your OS and architecture
- Places the binary in
~/.krew/bin/ -
The plugin becomes available as
kubectl <plugin-name>
kubectl krew list - List installed plugins
Syntax: kubectl krew list
Description: Shows all currently installed plugins managed by Krew.
- Displays plugin name and installed version
-
Only shows Krew-managed plugins (not manually installed ones)
kubectl krew search - Search for plugins
Syntax: kubectl krew search [keyword]
Description: Searches the Krew plugin index for available plugins.
- Without arguments, lists all available plugins
- With a keyword, filters plugins by name and description
-
Shows plugin name, description, installed status, and stability
# List all available plugins kubectl krew search # Search for plugins by keyword kubectl krew search secret # Search for resource-related plugins kubectl krew search resource # Example output: # NAME DESCRIPTION INSTALLED # view-secret Decode Kubernetes secrets yes # modify-secret Edit secrets in-place no
kubectl krew info - Show plugin details
Syntax: kubectl krew info <plugin-name>
Description: Displays detailed information about a specific plugin.
- Shows plugin name, version, homepage, and description
-
Includes supported platforms and caveats
kubectl krew update - Update the plugin index
Syntax: kubectl krew update
Description: Fetches the latest plugin index from the Krew plugin repository.
- Downloads the latest plugin metadata
- Does NOT update installed plugins
-
Should be run periodically to discover new plugins
kubectl krew upgrade - Upgrade installed plugins
Syntax: kubectl krew upgrade [plugin-name]
Description: Upgrades installed plugins to their latest versions.
- Without arguments, upgrades ALL installed plugins
-
With a plugin name, upgrades only that specific plugin
kubectl krew uninstall - Remove a plugin
Syntax: kubectl krew uninstall <plugin-name>
Description: Removes an installed plugin.
- Deletes the plugin binary and installation receipt
-
The
kubectl <plugin-name>command will no longer be available
Essential kubectl Plugins¶
Below is a curated list of the most useful kubectl plugins organized by category. These plugins can dramatically improve your daily Kubernetes workflow.
Cluster Navigation¶
| Plugin | Description | Usage |
|---|---|---|
ctx |
Switch between Kubernetes contexts quickly | kubectl ctx <context> |
ns |
Switch between namespaces quickly | kubectl ns <namespace> |
get-all |
List ALL resources in a namespace (not just common ones) | kubectl get-all |
Debugging & Inspection¶
| Plugin | Description | Usage |
|---|---|---|
debug-shell |
Create a debug container in a running pod | kubectl debug-shell <pod> |
pod-inspect |
Detailed pod inspection with events and logs | kubectl pod-inspect <pod> |
node-shell |
Open a shell into a Kubernetes node | kubectl node-shell <node> |
blame |
Show who last modified Kubernetes resources | kubectl blame <resource> <name> |
tail |
Stream logs from multiple pods (like stern) |
kubectl tail --ns <ns> |
Security & Secrets¶
| Plugin | Description | Usage |
|---|---|---|
view-secret |
Decode and view Kubernetes secrets easily | kubectl view-secret <secret> |
modify-secret |
Edit secrets in-place with base64 encoding handled | kubectl modify-secret <secret> |
access-matrix |
Show RBAC access matrix for a resource | kubectl access-matrix |
who-can |
Show who has RBAC permissions for an action | kubectl who-can get pods |
view-cert |
View certificate details from secrets | kubectl view-cert <secret> |
Resource Management¶
| Plugin | Description | Usage |
|---|---|---|
resource-capacity |
Show resource requests/limits and utilization | kubectl resource-capacity |
view-utilization |
Show cluster resource utilization | kubectl view-utilization |
count |
Count resources by kind | kubectl count pods |
images |
List container images running in the cluster | kubectl images |
neat |
Remove clutter from Kubernetes YAML output | kubectl get pod -o yaml \| kubectl neat |
Networking¶
| Plugin | Description | Usage |
|---|---|---|
ingress-rule |
List Ingress rules across the cluster | kubectl ingress-rule |
sniff |
Capture network traffic from a pod using tcpdump/Wireshark | kubectl sniff <pod> |
Lab¶
Step 01 - Install Krew¶
- Before using
Krew, you need to install it on your local machine.
(
set -x; cd "$(mktemp -d)" &&
OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
KREW="krew-${OS}_${ARCH}" &&
curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
tar zxvf "${KREW}.tar.gz" &&
./"${KREW}" install krew
)
Add Krew to PATH¶
After installation, you must add Krew to your shell PATH:
Verify Installation¶
kubectl krew version
## Expected output (version may vary):
# OPTION VALUE
# GitTag v0.4.4
# GitCommit ...
# IndexURI https://github.com/kubernetes-sigs/krew-index.git
# BasePath /home/user/.krew
# IndexPath /home/user/.krew/index/default
# InstallPath /home/user/.krew/store
# BinPath /home/user/.krew/bin
Step 02 - Update the Plugin Index¶
- Before installing plugins, update the local plugin index to get the latest available plugins:
- You should see output similar to:
Step 03 - Discover Plugins¶
- Browse the available plugins to find tools that match your needs:
# List all available plugins (200+)
kubectl krew search
# Search for specific functionality
kubectl krew search secret
kubectl krew search debug
kubectl krew search resource
# Get detailed info about a specific plugin
kubectl krew info view-secret
Step 04 - Install Essential Plugins¶
- Install a curated set of essential plugins for daily Kubernetes work:
# Context and namespace switching
kubectl krew install ctx
kubectl krew install ns
# Secret management
kubectl krew install view-secret
# Resource inspection
kubectl krew install get-all
kubectl krew install resource-capacity
kubectl krew install count
kubectl krew install images
# RBAC
kubectl krew install access-matrix
- Verify the installed plugins:
Step 05 - Use Context & Namespace Switchers¶
- The
ctxandnsplugins make switching between contexts and namespaces effortless:
Switch contexts with ctx¶
# List all available contexts (current context is highlighted)
kubectl ctx
# Switch to a different context
kubectl ctx my-other-cluster
# Switch back to the previous context
kubectl ctx -
Switch namespaces with ns¶
# List all namespaces (current namespace is highlighted)
kubectl ns
# Switch to a different namespace
kubectl ns kube-system
# Switch back to the previous namespace
kubectl ns -
# Verify the current namespace
kubectl config view --minify -o jsonpath='{.contexts[0].context.namespace}'
Step 06 - Inspect Secrets¶
- The
view-secretplugin makes it easy to decode and view Kubernetes secrets:
# First, create a test secret
kubectl create secret generic demo-secret \
--from-literal=username=admin \
--from-literal=password=s3cr3t
# List all secrets in the current namespace
kubectl view-secret
# View all keys in a specific secret
kubectl view-secret demo-secret
# View a specific key (decoded automatically)
kubectl view-secret demo-secret username
# View a specific key from a specific namespace
kubectl view-secret demo-secret password -n default
# Compare with the standard kubectl approach (base64 encoded)
kubectl get secret demo-secret -o jsonpath='{.data.password}' | base64 -d
Step 07 - Explore Cluster Resources¶
Get all resources with get-all¶
- Unlike
kubectl get all, which only shows common resources,get-alllists every resource in a namespace:
# List ALL resources in the current namespace
kubectl get-all
# List all resources in a specific namespace
kubectl get-all -n kube-system
Why get-all instead of kubectl get all?
kubectl get all only shows a subset of resources (Pods, Services, Deployments, ReplicaSets, etc.).
It does not show ConfigMaps, Secrets, Ingresses, ServiceAccounts, RBAC resources, CRDs, and many others.
The get-all plugin discovers and lists every resource type in the namespace.
Count resources with count¶
# Count all pods across all namespaces
kubectl count pods
# Count deployments in a specific namespace
kubectl count deployments -n kube-system
# Count all resource types
kubectl count all
View resource capacity¶
# Show node resource requests, limits, and utilization
kubectl resource-capacity
# Show with pods breakdown
kubectl resource-capacity --pods
# Show utilization percentages
kubectl resource-capacity --util
# Show specific resource type
kubectl resource-capacity --pods --util --sort cpu.util
Step 08 - Check RBAC Permissions¶
- The
access-matrixplugin helps you understand who can do what in your cluster:
# Show access matrix for pods in the current namespace
kubectl access-matrix --for pods
# Show access matrix for all resources
kubectl access-matrix
# Show what a specific service account can do
kubectl access-matrix --sa default:default
Step 09 - List Container Images¶
- The
imagesplugin shows all container images running in the cluster:
# List all images across all namespaces
kubectl images --all-namespaces
# List images in a specific namespace
kubectl images -n kube-system
# Show image columns
kubectl images --columns namespace,name,image
Step 10 - Update and Manage Plugins¶
# Update the plugin index (fetch new metadata)
kubectl krew update
# Upgrade all installed plugins to latest versions
kubectl krew upgrade
# Upgrade a specific plugin
kubectl krew upgrade ctx
# Uninstall a plugin you no longer need
kubectl krew uninstall sniff
# Check for outdated plugins
kubectl krew list
Step 11 - Create Your Own kubectl Plugin¶
- Any executable in your
PATHnamedkubectl-<name>becomes akubectlplugin. - Let’s create a simple plugin that shows pod resource usage:
Create the plugin script¶
cat << 'EOF' > kubectl-pod-status
#!/bin/bash
# kubectl-pod-status: Show a summary of pod statuses in a namespace
NAMESPACE="${1:---all-namespaces}"
if [ "$NAMESPACE" = "--all-namespaces" ]; then
NS_FLAG="--all-namespaces"
else
NS_FLAG="-n $NAMESPACE"
fi
echo "=== Pod Status Summary ==="
echo ""
# Count pods by status
kubectl get pods $NS_FLAG --no-headers 2>/dev/null | \
awk '{print $NF}' | \
sort | \
uniq -c | \
sort -rn | \
while read count status; do
printf " %-15s %s\n" "$status" "$count"
done
echo ""
echo "=== Total Pods ==="
TOTAL=$(kubectl get pods $NS_FLAG --no-headers 2>/dev/null | wc -l | tr -d ' ')
echo " Total: $TOTAL"
EOF
Install and test the plugin¶
# Make it executable
chmod +x kubectl-pod-status
# Move it to a directory in your PATH
sudo mv kubectl-pod-status /usr/local/bin/
# Verify kubectl recognizes it
kubectl plugin list
# Use your custom plugin
kubectl pod-status
kubectl pod-status kube-system
kubectl pod-status default
Clean up¶
Exercises¶
The following exercises will test your understanding of Krew and kubectl plugins.
Try to solve each exercise on your own before revealing the solution.
01. Find and Install a Plugin by Use Case¶
You need to find a plugin that can show you the certificate expiration dates stored in Kubernetes secrets. Find it, install it, and test it.
Scenario:¶
- You manage a cluster with TLS certificates stored as secrets.
- You need a quick way to inspect certificate details without manual base64 decoding and openssl commands.
Hint: Use kubectl krew search cert to find relevant plugins.
Solution
# Search for certificate-related plugins
kubectl krew search cert
# Install the view-cert plugin
kubectl krew install view-cert
# Create a self-signed certificate for testing
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout tls.key -out tls.crt \
-subj "/CN=test.example.com"
# Create a TLS secret
kubectl create secret tls test-tls \
--cert=tls.crt --key=tls.key
# View the certificate details
kubectl view-cert test-tls
# Clean up
kubectl delete secret test-tls
rm tls.key tls.crt
02. Compare kubectl get all vs kubectl get-all¶
Run both commands in the kube-system namespace and document the differences. How many more resource types does get-all discover?
Scenario:¶
- You need to audit all resources in a namespace for compliance purposes.
- The standard
kubectl get allmisses many resource types.
Hint: Pipe both outputs through grep "kind:" or count the lines.
Solution
# Standard kubectl - limited resource types
kubectl get all -n kube-system 2>/dev/null | head -50
# get-all plugin - discovers ALL resource types
kubectl get-all -n kube-system 2>/dev/null | head -100
# Count resource types from standard kubectl
echo "=== kubectl get all ==="
kubectl get all -n kube-system --no-headers 2>/dev/null | wc -l
# Count resource types from get-all
echo "=== kubectl get-all ==="
kubectl get-all -n kube-system --no-headers 2>/dev/null | wc -l
# The get-all plugin typically finds 2-5x more resources including:
# - ConfigMaps, Secrets, ServiceAccounts
# - Roles, RoleBindings, ClusterRoles
# - Events, Endpoints, EndpointSlices
# - PodDisruptionBudgets, NetworkPolicies
# - Custom Resources (CRDs)
03. Audit Cluster RBAC Permissions¶
Use Krew plugins to answer: “Which service accounts in the default namespace can create Deployments?”
Scenario:¶
- As a cluster administrator, you need to audit RBAC to ensure least-privilege access.
- Understanding who can create workloads is critical for security.
Hint: Install who-can or use access-matrix to check RBAC permissions.
Solution
# Install the who-can plugin
kubectl krew install who-can
# Check who can create deployments in the default namespace
kubectl who-can create deployments -n default
# Check who can delete pods
kubectl who-can delete pods -n default
# Check who can get secrets (sensitive!)
kubectl who-can get secrets -n default
# Use access-matrix for a broader view
kubectl access-matrix --for deployments -n default
04. Check Cluster Resource Utilization¶
Install and use the resource-capacity plugin to identify nodes with the highest CPU and memory utilization. Then check if any node is over 80% utilized.
Scenario:¶
- You’re troubleshooting slow pod scheduling and suspect resource exhaustion.
- You need a quick overview of cluster capacity vs. usage.
Hint: Use kubectl resource-capacity --util --sort cpu.util.
Solution
# Install the resource-capacity plugin (if not already installed)
kubectl krew install resource-capacity
# Show basic resource capacity
kubectl resource-capacity
# Show with utilization percentages
kubectl resource-capacity --util
# Sort by CPU utilization (highest first)
kubectl resource-capacity --util --sort cpu.util
# Show per-pod breakdown
kubectl resource-capacity --pods --util
# Show resource capacity with specific output
kubectl resource-capacity --util --pod-count
# To check if any node is over 80%, examine the output percentages
# Nodes with CPU% or Memory% above 80% may need attention
05. Create a Multi-Function kubectl Plugin¶
Create a custom kubectl plugin called kubectl-cluster-info-extended that shows:
- Current context and namespace
- Node count and status
- Pod count by namespace (top 5)
- Resource utilization summary
Scenario:¶
- You want a single command that gives you a quick cluster health overview.
- This is useful as a morning check or after a deployment.
Hint: Combine multiple kubectl commands in a bash script named kubectl-cluster_info_extended.
Solution
cat << 'PLUGINEOF' > kubectl-cluster_info_extended
#!/bin/bash
# kubectl-cluster-info-extended: Quick cluster health overview
echo "============================================"
echo " Kubernetes Cluster Overview"
echo "============================================"
echo ""
# Current context
echo "--- Context & Namespace ---"
CONTEXT=$(kubectl config current-context 2>/dev/null)
NAMESPACE=$(kubectl config view --minify -o jsonpath='{.contexts[0].context.namespace}' 2>/dev/null)
echo " Context: ${CONTEXT:-N/A}"
echo " Namespace: ${NAMESPACE:-default}"
echo ""
# Node status
echo "--- Nodes ---"
kubectl get nodes --no-headers 2>/dev/null | \
awk '{printf " %-30s %-10s %s\n", $1, $2, $5}'
NODE_COUNT=$(kubectl get nodes --no-headers 2>/dev/null | wc -l | tr -d ' ')
echo " Total: $NODE_COUNT nodes"
echo ""
# Pod count by namespace (top 5)
echo "--- Pods by Namespace (Top 5) ---"
kubectl get pods --all-namespaces --no-headers 2>/dev/null | \
awk '{print $1}' | \
sort | uniq -c | sort -rn | head -5 | \
while read count ns; do
printf " %-30s %s pods\n" "$ns" "$count"
done
echo ""
# Pod status summary
echo "--- Pod Status Summary ---"
kubectl get pods --all-namespaces --no-headers 2>/dev/null | \
awk '{print $4}' | \
sort | uniq -c | sort -rn | \
while read count status; do
printf " %-15s %s\n" "$status" "$count"
done
echo ""
echo "============================================"
PLUGINEOF
# Install and test
chmod +x kubectl-cluster_info_extended
sudo mv kubectl-cluster_info_extended /usr/local/bin/
# Run the plugin (note: hyphens in the name become spaces or underscores)
kubectl cluster-info-extended
# Clean up
sudo rm /usr/local/bin/kubectl-cluster_info_extended
06. Manage Plugin Lifecycle¶
Perform a full plugin lifecycle: search, install, use, update, and uninstall. Track the disk space used by Krew plugins before and after.
Scenario:¶
- You’re managing a shared jump host where disk space matters.
- You need to keep installed plugins lean and up-to-date.
Hint: Check the ~/.krew/store/ directory size with du -sh.
Solution
# Check initial disk usage
echo "=== Before ==="
du -sh ~/.krew/ 2>/dev/null || echo "Krew not yet installed"
# Update the plugin index
kubectl krew update
# Search and install a plugin
kubectl krew search neat
kubectl krew install neat
# Use the plugin
kubectl get pod -n kube-system -o yaml | head -1 | kubectl neat
# List installed plugins
kubectl krew list
# Check disk usage after install
echo "=== After Install ==="
du -sh ~/.krew/
du -sh ~/.krew/store/
# Upgrade all plugins
kubectl krew upgrade
# Uninstall the plugin
kubectl krew uninstall neat
# Check disk usage after uninstall
echo "=== After Uninstall ==="
du -sh ~/.krew/
# List remaining plugins
kubectl krew list
Finalize & Cleanup¶
- To remove all plugins installed during this lab:
# List all installed plugins
kubectl krew list
# Uninstall specific plugins
kubectl krew uninstall ctx ns view-secret get-all \
resource-capacity count images access-matrix
# Remove Krew completely (optional)
rm -rf ~/.krew
- Remove the
PATHentry from your shell configuration file if you no longer wantKrew.
Troubleshooting¶
kubectl krewcommand not found:
Make sure ~/.krew/bin is in your PATH. Add it to your shell configuration:
- Plugin installation fails:
Update the plugin index and try again:
- Plugin not found after installation:
Verify the plugin is in the Krew bin directory and your PATH is correct:
- Custom plugin not recognized by kubectl:
Ensure the plugin file name follows the pattern kubectl-<name>, is executable, and is in a directory listed in your PATH:
# Check kubectl can find your plugins
kubectl plugin list
# Verify the file is executable
ls -la /usr/local/bin/kubectl-*
- Permission errors during installation:
Krew installs plugins in your home directory and should not require sudo. If you encounter permission issues:
# Check Krew directory ownership
ls -la ~/.krew/
# Fix ownership if needed
sudo chown -R $(whoami) ~/.krew/
Next Steps¶
- Explore the full Krew Plugin Index for more plugins
- Learn about writing kubectl plugins in any language
- Submit your own plugin to the Krew Index
- Combine
Krewplugins with shell aliases for even faster workflows - Explore Krew Custom Indexes for private plugin distribution
kubeadm - Bootstrap a Kubernetes Cluster from Scratch¶
- In this lab we will learn how to use kubeadm to bootstrap a fully functional Kubernetes cluster. We will set up a control-plane node, join worker nodes, install a CNI plugin, and verify the cluster is operational.
What will we learn?¶
- What
kubeadmis and its role in the Kubernetes ecosystem - How to prepare nodes (control-plane and workers) for cluster creation
- How to initialize a control-plane node with
kubeadm init - How to join worker nodes with
kubeadm join - How to install a Container Network Interface (CNI) plugin
- How to configure
kubectlfor the new cluster - How to upgrade a cluster using
kubeadm upgrade - How to reset and tear down a cluster with
kubeadm reset - Best practices for production-grade cluster bootstrapping
Official Documentation & References¶
| Resource | Link |
|---|---|
| kubeadm Overview | kubernetes.io/docs |
| Creating a cluster with kubeadm | kubernetes.io/docs |
| kubeadm init | kubernetes.io/docs |
| kubeadm join | kubernetes.io/docs |
| kubeadm upgrade | kubernetes.io/docs |
| kubeadm reset | kubernetes.io/docs |
| Installing kubeadm | kubernetes.io/docs |
| Container Runtimes | kubernetes.io/docs |
| CNI Plugins | kubernetes.io/docs |
Prerequisites¶
- Two or more Linux machines (physical or virtual) - one for the control-plane and one or more for workers
- Each machine must have at least 2 CPUs and 2 GB RAM (recommended)
- Full network connectivity between all machines
- Unique hostname, MAC address, and
product_uuidfor each node - Swap disabled on all nodes
- A supported container runtime installed (containerd, CRI-O, or Docker with cri-dockerd)
Lab Environment Options
You can run this lab using:
- Multipass VMs on macOS/Linux
- Vagrant + VirtualBox or libvirt
- Cloud VMs (AWS EC2, GCP Compute Engine, Azure VMs)
- Docker containers for a quick local experiment (see the included
setup-k8s.sh) - LXD containers on Linux
For this lab, instructions are given for Ubuntu/Debian-based systems. Adapt package commands as needed for other distributions.
kubeadm Architecture Overview¶
graph TB
subgraph control["Control-Plane Node"]
api["kube-apiserver"]
etcd["etcd"]
sched["kube-scheduler"]
cm["kube-controller-manager"]
kubelet_cp["kubelet"]
cni_cp["CNI Plugin"]
end
subgraph worker1["Worker Node 1"]
kubelet_w1["kubelet"]
proxy_w1["kube-proxy"]
cni_w1["CNI Plugin"]
pods_w1["Pods"]
end
subgraph worker2["Worker Node 2"]
kubelet_w2["kubelet"]
proxy_w2["kube-proxy"]
cni_w2["CNI Plugin"]
pods_w2["Pods"]
end
api <--> etcd
api <--> sched
api <--> cm
kubelet_cp --> api
kubelet_w1 --> api
kubelet_w2 --> api
style control fill:#326CE5,color:#fff
style worker1 fill:#4a9,color:#fff
style worker2 fill:#4a9,color:#fff
| Component | Role |
|---|---|
kubeadm init |
Bootstraps the control-plane node |
kubeadm join |
Joins a worker (or additional control-plane) node to the cluster |
kubeadm upgrade |
Upgrades the cluster to a newer Kubernetes version |
kubeadm reset |
Tears down the cluster on a node (undo init or join) |
kubeadm token |
Manages bootstrap tokens for node joining |
kubeadm certs |
Manages cluster certificates |
01. Prepare all nodes¶
Run these steps on every node (control-plane and workers).
01.01 Disable swap¶
Kubernetes requires swap to be disabled:
# Disable swap immediately
sudo swapoff -a
# Disable swap permanently (comment out swap in fstab)
sudo sed -i '/ swap / s/^/#/' /etc/fstab
01.02 Load required kernel modules¶
# Load modules needed by containerd/Kubernetes networking
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
01.03 Set sysctl parameters for Kubernetes networking¶
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply without reboot
sudo sysctl --system
01.04 Install the container runtime (containerd)¶
# Install containerd
sudo apt-get update
sudo apt-get install -y containerd
# Generate default config
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# Enable SystemdCgroup (required for kubeadm)
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
# Restart containerd
sudo systemctl restart containerd
sudo systemctl enable containerd
Why containerd?
- Since Kubernetes 1.24, dockershim was removed from
kubelet. - The recommended container runtimes are containerd or CRI-O.
- If you need Docker CLI tools, install Docker separately - it will use containerd under the hood.
01.05 Install kubeadm, kubelet, and kubectl¶
# Install prerequisites
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
# Add the Kubernetes apt repository signing key
sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | \
sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# Add the Kubernetes apt repository
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /' | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
# Install kubeadm, kubelet, and kubectl
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
# Pin their versions to prevent accidental upgrades
sudo apt-mark hold kubelet kubeadm kubectl
# Add the Kubernetes yum repository
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.31/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.31/rpm/repodata/repomd.xml.key
EOF
# Install kubeadm, kubelet, and kubectl
sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
# Enable kubelet service
sudo systemctl enable kubelet
01.06 Enable the kubelet service¶
Note
At this point the kubelet will crash-loop every few seconds - this is expected. It is waiting for kubeadm init or kubeadm join to provide its configuration.
02. Initialize the control-plane node¶
Run these steps only on the control-plane node.
02.01 Run kubeadm init¶
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=<CONTROL_PLANE_IP>
Replace Placeholders
Replace <CONTROL_PLANE_IP> with the actual IP address of your control-plane node. The --pod-network-cidr must match the CIDR expected by your CNI plugin (10.244.0.0/16 for Flannel, 192.168.0.0/16 for Calico).
Expected output (excerpt):
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Then you can join any number of worker nodes by running the following
on each as root:
kubeadm join <CONTROL_PLANE_IP>:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
02.02 Configure kubectl¶
# Set up kubectl for the current user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
02.03 Verify control-plane status¶
# The node should appear with status NotReady (until CNI is installed)
kubectl get nodes
# Check that control-plane pods are running
kubectl get pods -n kube-system
Expected output:
NAME READY STATUS RESTARTS AGE
coredns-xxxxxxxxxx-xxxxx 0/1 Pending 0 1m
coredns-xxxxxxxxxx-xxxxx 0/1 Pending 0 1m
etcd-control-plane 1/1 Running 0 1m
kube-apiserver-control-plane 1/1 Running 0 1m
kube-controller-manager-control-plane 1/1 Running 0 1m
kube-proxy-xxxxx 1/1 Running 0 1m
kube-scheduler-control-plane 1/1 Running 0 1m
Note
CoreDNS pods will stay in Pending state until a CNI plugin is installed. This is normal.
03. Install a CNI plugin¶
A CNI (Container Network Interface) plugin is required so pods can communicate across nodes. Choose one of the following:
Note
Flannel requires --pod-network-cidr=10.244.0.0/16 to be set during kubeadm init.
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml
Note
If you used --pod-network-cidr=192.168.0.0/16 during kubeadm init, Calico works with zero additional config. For custom CIDRs, edit the CALICO_IPV4POOL_CIDR environment variable in the manifest.
# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
curl -L --fail --remote-name-all \
https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz
# Install Cilium into the cluster
cilium install
After installing the CNI, verify all pods come up:
# Wait for all system pods to be running
kubectl get pods -n kube-system -w
# Node should now show Ready
kubectl get nodes
Expected output:
04. Join worker nodes¶
Run these steps on each worker node.
04.01 Use the join command from kubeadm init output¶
sudo kubeadm join <CONTROL_PLANE_IP>:6443 \
--token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
Forgot the join command?
If you lost the join command, generate a new token on the control-plane node:
04.02 Verify nodes from the control-plane¶
Expected output:
NAME STATUS ROLES AGE VERSION
control-plane Ready control-plane 10m v1.31.x
worker-1 Ready <none> 2m v1.31.x
worker-2 Ready <none> 1m v1.31.x
04.03 (Optional) Label worker nodes¶
kubectl label node worker-1 node-role.kubernetes.io/worker=worker
kubectl label node worker-2 node-role.kubernetes.io/worker=worker
05. Verify the cluster¶
05.01 Deploy a test application¶
kubectl create deployment nginx-test --image=nginx --replicas=3
kubectl expose deployment nginx-test --port=80 --type=NodePort
05.02 Check deployment status¶
# All 3 replicas should be running across the nodes
kubectl get pods -o wide
# Get the NodePort
kubectl get svc nginx-test
05.03 Test connectivity¶
# Get the NodePort assigned
NODE_PORT=$(kubectl get svc nginx-test -o jsonpath='{.spec.ports[0].nodePort}')
# Test from any node (replace with an actual node IP)
curl http://<NODE_IP>:${NODE_PORT}
05.04 Run a cluster health check¶
# Check component status
kubectl get componentstatuses 2>/dev/null || kubectl get --raw='/readyz?verbose'
# Check all namespaces
kubectl get pods --all-namespaces
# Verify DNS resolution
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- \
nslookup kubernetes.default.svc.cluster.local
05.05 Clean up the test deployment¶
06. (Optional) Allow scheduling on the control-plane¶
By default, the control-plane node has a taint that prevents workload pods from being scheduled on it. For single-node clusters or development environments, remove it:
Warning
Do not do this in production. The control-plane node should be dedicated to running control-plane components for stability and security.
07. Managing bootstrap tokens¶
# List existing tokens
kubeadm token list
# Create a new token (default TTL: 24h)
kubeadm token create
# Create a token with a custom TTL
kubeadm token create --ttl 2h
# Create a token and print the full join command
kubeadm token create --print-join-command
# Delete a specific token
kubeadm token delete <TOKEN>
08. Upgrade the cluster¶
To upgrade a cluster from one Kubernetes minor version to the next:
08.01 Upgrade the control-plane¶
# Update the Kubernetes repo to the target version (e.g., v1.32)
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
# Update apt and upgrade kubeadm
sudo apt-get update
sudo apt-mark unhold kubeadm
sudo apt-get install -y kubeadm
sudo apt-mark hold kubeadm
# Verify kubeadm version
kubeadm version
# Check the upgrade plan
sudo kubeadm upgrade plan
# Apply the upgrade (replace with actual target version)
sudo kubeadm upgrade apply v1.32.0
08.02 Upgrade kubelet and kubectl on the control-plane¶
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y kubelet kubectl
sudo apt-mark hold kubelet kubectl
# Restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
08.03 Upgrade worker nodes¶
On each worker node:
# Drain the node (run from the control-plane)
kubectl drain <WORKER_NODE> --ignore-daemonsets --delete-emptydir-data
# On the worker node: upgrade kubeadm, then kubelet + kubectl
sudo apt-mark unhold kubeadm kubelet kubectl
sudo apt-get update
sudo apt-get install -y kubeadm kubelet kubectl
sudo apt-mark hold kubeadm kubelet kubectl
# Upgrade the node configuration
sudo kubeadm upgrade node
# Restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
08.04 Verify the upgrade¶
Expected output:
NAME STATUS ROLES AGE VERSION
control-plane Ready control-plane 1d v1.32.0
worker-1 Ready worker 1d v1.32.0
worker-2 Ready worker 1d v1.32.0
09. Certificate management¶
kubeadm manages cluster certificates automatically. Here are useful commands:
# Check certificate expiration dates
sudo kubeadm certs check-expiration
# Renew all certificates
sudo kubeadm certs renew all
# Renew a specific certificate
sudo kubeadm certs renew apiserver
Certificate Validity
By default, kubeadm certificates are valid for 1 year. The CA certificate is valid for 10 years. Plan certificate renewal before expiration to avoid cluster downtime.
10. Using a kubeadm configuration file¶
Instead of passing many command-line flags, you can use a configuration file:
# manifests/kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: v1.31.0
controlPlaneEndpoint: "control-plane:6443"
networking:
podSubnet: "10.244.0.0/16"
serviceSubnet: "10.96.0.0/12"
dnsDomain: "cluster.local"
apiServer:
extraArgs:
- name: audit-log-path
value: /var/log/kubernetes/audit.log
- name: audit-log-maxage
value: "30"
etcd:
local:
dataDir: /var/lib/etcd
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
taints:
- key: "node-role.kubernetes.io/control-plane"
effect: "NoSchedule"
Generating a Default Config
You can generate a default configuration to customize:
11. Reset and tear down¶
To completely remove Kubernetes from a node:
# Reset the node (removes all cluster state)
sudo kubeadm reset -f
# Clean up networking rules and CNI configs
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
sudo rm -rf /etc/cni/net.d
# Remove kubeconfig
rm -rf $HOME/.kube
# (Optional) Uninstall packages
sudo apt-mark unhold kubelet kubeadm kubectl
sudo apt-get purge -y kubelet kubeadm kubectl
Summary¶
| Concept | Key Takeaway |
|---|---|
| kubeadm init | Bootstraps a control-plane node with all required components |
| kubeadm join | Adds worker (or HA control-plane) nodes to the cluster |
| CNI plugin | Required for pod networking - install immediately after kubeadm init |
| kubeadm upgrade | Safely upgrades cluster version one minor release at a time |
| kubeadm reset | Cleanly tears down cluster state on a node |
| kubeadm token | Manages tokens for joining nodes (default 24h TTL) |
| kubeadm certs | Manages TLS certificates (default 1-year validity) |
| Swap must be off | kubelet will not start if swap is enabled |
| Container runtime | containerd or CRI-O required (dockershim removed since K8s 1.24) |
| Config file | --config flag allows declarative cluster configuration |
Exercises¶
The following exercises will test your understanding of kubeadm and cluster bootstrapping. Try to solve each exercise on your own before revealing the solution.
01. Create a single-node cluster that can run workloads¶
Create a cluster with kubeadm init that allows scheduling pods on the control-plane node.
Solution
# Initialize the cluster
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
# Set up kubeconfig
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Install CNI (Flannel)
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
# Remove the control-plane taint to allow scheduling
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
# Verify the node is Ready
kubectl get nodes
02. Generate a new join token and add a worker node¶
The original join token from kubeadm init has expired. Generate a new one and join a worker.
Solution
# On the control-plane node - generate a new token with the full join command
kubeadm token create --print-join-command
# On the worker node - run the printed command, e.g.:
sudo kubeadm join 192.168.1.100:6443 \
--token abc123.abcdefghijklmnop \
--discovery-token-ca-cert-hash sha256:abc123...
# On the control-plane - verify the worker joined
kubectl get nodes
03. Check and renew cluster certificates¶
Check when the cluster certificates expire and renew the API server certificate.
Solution
# Check expiration dates for all certificates
sudo kubeadm certs check-expiration
# Renew only the API server certificate
sudo kubeadm certs renew apiserver
# Restart the API server to pick up the new cert
# (API server runs as a static pod, so restart kubelet or move the manifest)
sudo systemctl restart kubelet
# Verify the new certificate
sudo kubeadm certs check-expiration | grep apiserver
04. Perform a cluster upgrade from v1.31 to v1.32¶
Plan and execute a full cluster upgrade across control-plane and worker nodes.
Solution
# ---- Control-Plane Node ----
# 1. Update the apt repo to v1.32
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
# 2. Upgrade kubeadm
sudo apt-get update
sudo apt-mark unhold kubeadm
sudo apt-get install -y kubeadm
sudo apt-mark hold kubeadm
# 3. Check the plan
sudo kubeadm upgrade plan
# 4. Apply the upgrade
sudo kubeadm upgrade apply v1.32.0
# 5. Upgrade kubelet and kubectl
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y kubelet kubectl
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# ---- Each Worker Node ----
# 6. Drain the worker (from control-plane)
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data
# 7. On the worker: upgrade all components
sudo apt-mark unhold kubeadm kubelet kubectl
sudo apt-get update
sudo apt-get install -y kubeadm kubelet kubectl
sudo apt-mark hold kubeadm kubelet kubectl
# 8. Upgrade the node config
sudo kubeadm upgrade node
# 9. Restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# 10. Uncordon the worker (from control-plane)
kubectl uncordon worker-1
# Verify
kubectl get nodes
Troubleshooting¶
-
kubelet crash-loops after
Common causes: swap is enabled, cgroup driver mismatch, containerd not running.kubeadm init: -
CoreDNS pods stuck in Pending: A CNI plugin must be installed. CoreDNS cannot schedule without a pod network.
-
Node remains NotReady:
-
Token expired when trying to join:
-
kubeadm initfails with preflight errors: -
Certificate errors after cluster has been running for a year:
-
cgroup driver mismatch:
Next Steps¶
- Set up a highly available (HA) control-plane with multiple control-plane nodes and an external load balancer - see HA topology
- Add etcd encryption at rest for secrets - see Encrypting Secret Data
- Configure audit logging to track API server requests
- Set up RBAC (see Lab 31 - RBAC) for fine-grained access control
- Explore kubeadm phases (
kubeadm init phase) for granular control over cluster bootstrapping - Consider managed alternatives for production: EKS, GKE, or AKS if cluster management overhead is too high
Telepresence¶
What will we learn?¶
- What Telepresence is and why it’s useful for Kubernetes development
- How to install and configure Telepresence on your system
- Creating and managing different types of service intercepts
- Debugging applications locally while connected to a cluster
- Team collaboration using personal intercepts and preview URLs
- Best practices for cloud-native development workflows
Introduction¶
- Telepresence is a powerful tool that allows developers to code and test microservices locally while connecting to a remote Kubernetes cluster
- It bridges the gap between local development and cloud-native environments, enabling fast feedback loops
- Eliminates the need to continuously build, push, and deploy containers during development
What is Telepresence?¶
- Telepresence is an open-source tool that creates a bi-directional network proxy between your local development machine and a Kubernetes cluster
Telepresence Capabilities
Local Development with Cluster Access
- Run a single service locally while connecting to remote Kubernetes services
- Debug services using your local IDE, debuggers, and development tools
- Test integrations with other services running in the cluster
- Develop without waiting for container builds and deployments
- Work with realistic data and dependencies from the cluster
Architecture Overview¶
graph TD
A[Local Machine] --> B[Telepresence Client]
B -->|Network Tunnel| C[Kubernetes Cluster]
C --> D[Traffic Manager]
D --> E[Your Service - Local]
D --> F[Other Services]
C --> G[Database]
C --> H[Message Queue]
C --> I[APIs]
F --> G
F --> H
F --> I
style A fill:#667eea
style B fill:#764ba2
style C fill:#48bb78
style D fill:#ed8936
style E fill:#f6ad55
Terminology¶
| Term | Description |
|---|---|
| Traffic Manager | Controller deployed in the cluster that manages intercepts and routing |
| Intercept | Route traffic from a service in the cluster to your local machine |
| Global Intercept | All traffic to a service goes to your local machine |
| Personal Intercept | Only traffic matching specific headers goes to your local machine |
| Preview URL | Shareable URL that routes to your local development environment |
Key Features¶
-
Fast Inner Loop Development
- Edit code locally and see changes immediately.
- No container rebuilds required.
- Instant feedback on code changes.
-
Full Network Access
- Access cluster resources as if running in the cluster.
- Connect to databases, message queues, and other services.
- Test with production-like data and dependencies.
-
Service Intercepts
- Route cluster traffic to your local machine.
- Debug production issues safely.
- Test changes without affecting teammates.
-
Preview URLs
- Share your local changes with stakeholders.
- Get feedback without deploying.
- Collaborate seamlessly.
-
Personal Intercepts
- Header-based traffic routing.
- Multiple developers on the same service. Safe parallel development.
-
Volume Mounts
Access ConfigMaps and Secrets locally. Test with real cluster configurations. Develop with production-like settings.
Prerequisites¶
Required Setup
- Running Kubernetes cluster (Minikube, Kind, or cloud provider)
kubectlconfigured and working- Admin access to the cluster (for Traffic Manager installation)
- Docker Desktop or Docker Engine running
- Internet connectivity
- Code editor (VS Code recommended)
System Requirements¶
| Requirement | Specification |
|---|---|
| OS | macOS, Linux, or Windows (WSL2) |
| Memory | At least 4GB RAM available |
| Disk | 2GB free space |
| Network | Stable internet connection |
Installation¶
Step 01 - Install Telepresence CLI¶
Step 02 - Verify Cluster Access¶
# Check cluster connectivity
kubectl cluster-info
# Check your context
kubectl config current-context
# Verify you have admin permissions
kubectl auth can-i create deployments --all-namespaces
Step 03 - Connect Telepresence¶
# Connect to the cluster
# This installs the Traffic Manager in your cluster
telepresence connect
# Check connection status
telepresence status
# List available services
telepresence list
Expected Output
Step 04 - Verify DNS Access¶
Demo Architecture¶
Our demo application consists of three microservices:
graph LR
A[Frontend<br/>Nginx] -->|HTTP| B[Backend<br/>Python/Flask]
B -->|HTTP| C[Data Service<br/>Python/Flask]
style A fill:#48bb78
style B fill:#ed8936
style C fill:#667eea
Service Description
Frontend (Nginx)
- Web UI for the application
- Proxies API calls to backend
Backend (Python/Flask) - INTERCEPT POINT
- REST API with multiple endpoints
- Communicates with data service
- This is the service we’ll intercept
Data Service (Python/Flask)
- Provides metrics and sample data
- Simulates data layer
Lab Setup¶
Step 01 - Quick Setup¶
Use the automated setup script:
What does setup.sh do?
- Creates the
telepresence-demonamespace - Deploys all three microservices
- Waits for services to be ready
- Displays access information
Step 02 - Manual Setup (Alternative)¶
If you prefer manual setup:
# Create namespace
kubectl create namespace telepresence-demo
# Set default namespace
kubectl config set-context --current --namespace=telepresence-demo
# Deploy services
kubectl apply -f resources/01-namespace.yaml
kubectl apply -f resources/02-dataservice.yaml
kubectl apply -f resources/03-backend.yaml
kubectl apply -f resources/04-frontend.yaml
# Wait for pods to be ready
kubectl wait --for=condition=ready pod --all --timeout=120s
Step 03 - Verify Deployment¶
# Check all resources
kubectl get all -n telepresence-demo
# Test frontend
kubectl port-forward -n telepresence-demo svc/frontend 8080:80
Then open your browser to http://localhost:8080
Exercise 1: Basic Intercept¶
Goal
Intercept the backend service and route all traffic to your local development environment.
Step 1 - Prepare Local Environment¶
# Navigate to backend source code
cd resources/backend-app
# Create Python virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
Step 2 - Connect Telepresence¶
# Ensure you're connected
telepresence connect
# List available services
telepresence list --namespace telepresence-demo
Step 3 - Create Intercept¶
# Intercept the backend service on port 5000
telepresence intercept backend \
--port 5000 \
--namespace telepresence-demo
Intercept Created
Step 4 - Run Local Service¶
Step 5 - Test the Intercept¶
# In a new terminal, forward the frontend
kubectl port-forward -n telepresence-demo svc/frontend 8080:80
# Open http://localhost:8080 in your browser
# All backend requests now go to your local machine!
Step 6 - Make Live Changes¶
Try This
- Edit
resources/backend-app/app.py - Modify a response message
- Save the file (Flask auto-reloads in debug mode)
- Refresh your browser
- See your changes immediately!
Step 7 - Remove Intercept¶
Exercise 2: Preview URLs¶
Goal
Create a shareable preview URL for your local changes.
Prerequisites¶
- Ambassador Cloud account (free tier available)
- Internet-accessible cluster
Step 1 - Login to Ambassador Cloud¶
Step 2 - Create Preview Intercept¶
# Create an intercept with preview URL
telepresence intercept backend \
--port 5000 \
--namespace telepresence-demo \
--preview-url=true
Step 3 - Share and Test¶
- Copy the preview URL
- Share it with teammates or stakeholders
- Make changes to your local code
- Others see changes in real-time via the preview URL!
Exercise 3: Global Intercept¶
Goal
Route ALL traffic for a service to your local machine.
Use with Caution
Global intercepts affect all users of the service. Only use in development environments or when you have exclusive access.
# Create a global intercept
telepresence intercept backend \
--port 5000 \
--namespace telepresence-demo
# All traffic to backend now goes to localhost:5000
Use Cases:
- Testing breaking changes
- Debugging production-like issues
- Performance testing with real load
Exercise 4: Personal Intercept¶
Goal
Only intercept requests that match specific HTTP headers. This allows multiple developers to work on the same service simultaneously.
Step 1 - Create Selective Intercept¶
# Intercept only requests with specific header
telepresence intercept backend \
--port 5000 \
--namespace telepresence-demo \
--http-match=auto
Step 2 - Get Your Intercept ID¶
Step 3 - Test with Header¶
# Requests WITHOUT your header go to cluster
curl http://backend.telepresence-demo.svc.cluster.local:5000/api/health
# Requests WITH your header go to your local machine
curl -H "x-telepresence-intercept-id: [your-id]" \
http://backend.telepresence-demo.svc.cluster.local:5000/api/health
Step 4 - Browser Testing¶
Install a browser extension to add custom headers:
- ModHeader (Chrome/Firefox)
- Modify Headers (Firefox)
Add the header with your intercept ID, and only your browser will see your local changes!
Advanced Features¶
Environment Variables¶
Capture cluster environment variables:
# Intercept and capture environment
telepresence intercept backend \
--port 5000 \
--namespace telepresence-demo \
--env-file=.env.cluster
# Load them in your app
source .env.cluster
python app.py
Volume Mounts¶
Access remote volumes locally:
# Intercept with volume mounts
telepresence intercept backend \
--port 5000 \
--namespace telepresence-demo \
--mount=true
# Volumes mounted at:
# ~/telepresence/[namespace]/[pod-name]/[volume-name]
Docker Mode¶
Run your local service in Docker:
# Intercept and connect Docker container
telepresence intercept backend \
--port 5000 \
--namespace telepresence-demo \
--docker-run -- \
-v $(pwd):/app \
my-backend-image:dev
Common Commands Reference¶
| Command | Description |
|---|---|
telepresence connect |
Connect to cluster |
telepresence status |
Show connection status |
telepresence list |
List available services |
telepresence intercept SERVICE --port PORT |
Create basic intercept |
telepresence leave SERVICE |
Remove specific intercept |
telepresence leave --all |
Remove all intercepts |
telepresence quit |
Disconnect from cluster |
telepresence loglevel debug |
Enable debug logging |
Troubleshooting¶
Connection Issues¶
# Check status
telepresence status
# Reconnect
telepresence quit
telepresence connect
# Check logs
telepresence loglevel debug
DNS Not Working¶
Intercept Not Working¶
# Check intercept status
telepresence list
# View traffic manager logs
kubectl logs -n ambassador deployment/traffic-manager
Port Conflicts¶
# Use different local port
telepresence intercept backend \
--port 5001:5000 \
--namespace telepresence-demo
More Help
See TROUBLESHOOTING.md for detailed solutions to common issues.
Best Practices¶
Development Workflow
Keep Connection Active
- Keep telepresence connected during development sessions
Use Personal Intercepts
- Avoid disrupting teammates in shared environments
Environment Parity
- Capture and use cluster environment variables
Clean Up
- Always remove intercepts when done
- Use
telepresence leave --all
Security
- Be cautious with sensitive environment variables
- Don’t expose internal services unnecessarily
- Use preview URLs with expiration times
- Follow your organization’s security policies
Performance
- Run heavy services (databases) in cluster
- Understand that remote calls have network latency
- Use selective intercepts to minimize overhead
- Maintain stable internet connection
Cleanup¶
Remove Intercepts¶
# Leave specific intercept
telepresence leave backend
# Leave all intercepts
telepresence leave --all
Disconnect from Cluster¶
# Disconnect but leave traffic manager
telepresence quit
# Disconnect and remove traffic manager
telepresence uninstall --everything
Delete Demo Resources¶
Additional Resources¶
Learn More
- Official Documentation
- GitHub Repository
- Community Slack
- EXAMPLES.md - 12 practical examples
- TROUBLESHOOTING.md - Detailed problem solving
Key Takeaways¶
What You Learned
β Telepresence bridges local development and Kubernetes clusters
β Enables fast inner-loop development without container rebuilds
β Supports multiple intercept types for different scenarios
β Works with existing development tools and IDEs
β Dramatically improves developer productivity
β Essential tool for modern cloud-native development
Next Steps¶
- Experiment with different intercept types
- Integrate Telepresence into your CI/CD pipeline
- Create team workflows using personal intercepts
- Explore EXAMPLES.md for more use cases
- Set up automated testing with Telepresence
- Configure Telepresence for your specific stack
Happy Coding! π
KEDA - Kubernetes Event-Driven Autoscaling¶
KEDAis a Kubernetes-based Event Driven Autoscaler that extends the native KubernetesHorizontalPodAutoscaler(HPA).- It allows you to scale any container in Kubernetes based on the number of events from virtually any event source - queues, streams, databases, HTTP traffic, cron schedules, and more.
KEDAis a CNCF Graduated project (since 2023), widely adopted and production-proven.
What will we learn?¶
- What
KEDAis and how it differs from native HPA - How KEDA architecture works (Operator, Metrics Adapter, Scalers)
- KEDA core CRDs:
ScaledObject,ScaledJob,TriggerAuthentication - How to install KEDA via Helm
- Scale to zero and scale from zero using event-driven triggers
- Real-world scalers: CPU/Memory, Cron, Redis, Kafka, HTTP, Prometheus
- Using
TriggerAuthenticationwith Kubernetes Secrets - Scaling Jobs (not Deployments) with
ScaledJob - Combining KEDA with ArgoCD (Lab 18) for GitOps-managed autoscaling
Official Documentation & References¶
| Resource | Link |
|---|---|
| KEDA Official Documentation | keda.sh |
| KEDA Scalers Reference | keda.sh/docs/scalers |
| KEDA on ArtifactHub (Helm) | artifacthub.io/keda |
| KEDA GitHub Repository | github.com/kedacore/keda |
| KEDA HTTP Add-on | github.com/kedacore/http-add-on |
| CNCF Project Page | cncf.io/projects/keda |
The Problem KEDA Solves¶
Native HPA Limitations¶
Kubernetes’ built-in HorizontalPodAutoscaler only scales on CPU and memory metrics (or custom metrics via the Metrics API, which is complex to set up). This means you can’t natively:
- Scale a worker deployment to zero when a job queue is empty
- Scale up when a Kafka topic has unread messages
- Scale based on a cron schedule (e.g., double capacity every weekday morning)
- Scale based on a Redis list length, database row count, or HTTP request rate
graph LR
hpa["Native HPA"] -- "Only" --> cpu["CPU Metrics"]
hpa -- "Only" --> mem["Memory Metrics"]
hpa -- "Complex setup" --> custom["Custom Metrics API"]
keda["KEDA"] -- "50+ scalers" --> kafka["Kafka"]
keda --> rabbitmq["RabbitMQ"]
keda --> redis["Redis"]
keda --> cron["Cron Schedule"]
keda --> http["HTTP Traffic"]
keda --> prometheus["Prometheus"]
keda --> aws["AWS SQS / SNS"]
keda --> azure["Azure Service Bus"]
keda --> gcp["GCP Pub/Sub"]
keda --> "..."
Scale-to-Zero: The Game Changer¶
KEDA’s most powerful feature is scale-to-zero: when there are no events, pods scale down to 0 replicas, saving resources. When events arrive, KEDA scales back up instantly.
| Scenario | Without KEDA | With KEDA |
|---|---|---|
| Idle queue worker | 1-3 pods always running | 0 pods (scale to zero) |
| Morning traffic spike | Manual scaling or slow HPA | Pre-warmed via Cron scaler |
| Kafka consumer lag | Fixed replica count | Dynamic scaling on lag metric |
| Batch job | Long-running Deployment | Short-lived Jobs, scaled by queue depth |
KEDA Architecture¶
graph TB
subgraph cluster["Kubernetes Cluster"]
subgraph keda_ns["keda namespace"]
operator["KEDA Operator\n(keda-operator)"]
metrics["KEDA Metrics Adapter\n(keda-operator-metrics-apiserver)"]
hooks["KEDA Admission Webhooks\n(keda-admission-webhooks)"]
end
subgraph app_ns["app namespace"]
so["ScaledObject CRD"]
sj["ScaledJob CRD"]
ta["TriggerAuthentication CRD"]
deployment["Deployment / StatefulSet"]
end
hpa_k8s["Kubernetes HPA\n(managed by KEDA)"]
k8s_api["Kubernetes API Server"]
end
subgraph external["External Event Sources"]
kafka_ext["Kafka Cluster"]
redis_ext["Redis"]
rabbitmq_ext["RabbitMQ"]
prom_ext["Prometheus"]
cron_ext["Cron Schedule"]
end
so --> operator
sj --> operator
ta --> operator
operator -- "Creates & manages" --> hpa_k8s
operator -- "Queries metrics" --> external
hpa_k8s -- "Scales" --> deployment
metrics -- "Exposes custom metrics" --> k8s_api
k8s_api --> hpa_k8s
kafka_ext --> operator
redis_ext --> operator
rabbitmq_ext --> operator
prom_ext --> operator
cron_ext --> operator
KEDA Components¶
| Component | Description |
|---|---|
| keda-operator | Watches ScaledObject/ScaledJob CRDs; creates/manages HPA objects; polls event sources |
| keda-operator-metrics-apiserver | Exposes custom metrics to the Kubernetes Metrics API so native HPA can read them |
| keda-admission-webhooks | Validates KEDA CRDs on admission (prevents misconfigurations) |
How KEDA Works (Step by Step)¶
sequenceDiagram
participant Dev as Developer
participant Git as Git/kubectl
participant K8s as Kubernetes API
participant KEDA as KEDA Operator
participant Scaler as Event Source (e.g. Redis)
participant HPA as Kubernetes HPA
participant Pod as Application Pods
Dev->>Git: Apply ScaledObject manifest
Git->>K8s: Create ScaledObject CRD
K8s-->>KEDA: ScaledObject admitted & stored
KEDA->>K8s: Create/update HPA for the target Deployment
loop Every polling interval (default 30s)
KEDA->>Scaler: Query metric (e.g. Redis list length)
Scaler-->>KEDA: Current value (e.g. 150 messages)
KEDA->>K8s: Update HPA with current metric value
K8s->>HPA: HPA calculates desired replicas
HPA->>Pod: Scale Deployment up/down
end
KEDA Terminology¶
| Term | Kind | Description |
|---|---|---|
| ScaledObject | CRD | Links a Deployment/StatefulSet/custom workload to one or more scalers. KEDA creates a managed HPA for it. |
| ScaledJob | CRD | Like ScaledObject but for Kubernetes Jobs - creates one Job per event (or batches) instead of scaling pods |
| TriggerAuthentication | CRD | Stores authentication configs (secrets, pod identity) for scalers that need credentials |
| ClusterTriggerAuthentication | CRD | Same as TriggerAuthentication but cluster-scoped (reusable across namespaces) |
| Scaler | Built-in | A plugin inside KEDA that knows how to query a specific event/metric source |
| Trigger | Config | A single scaler configuration inside a ScaledObject/ScaledJob |
| minReplicaCount | Config | Minimum replicas (can be 0 for scale-to-zero) |
| maxReplicaCount | Config | Maximum replicas KEDA is allowed to scale to |
| cooldownPeriod | Config | Seconds KEDA waits after last event before scaling back to minReplicaCount |
| pollingInterval | Config | How often KEDA queries the scaler (default: 30 seconds) |
Available Scalers (50+)¶
KEDA ships with scalers for virtually every major event/metric source:
| Category | Scalers |
|---|---|
| Message Queues | Apache Kafka, RabbitMQ, Azure Service Bus, AWS SQS, GCP Pub/Sub, NATS JetStream, IBM MQ |
| Databases | Redis (List/Stream/Cluster/Sentinel), PostgreSQL, MySQL, MSSQL, MongoDB, CouchDB |
| Storage | AWS S3, Azure Blob Storage, GCS Bucket |
| Monitoring | Prometheus, Datadog, Graphite, InfluxDB, New Relic |
| HTTP | HTTP Add-on (external component) |
| Compute | CPU, Memory (same as HPA but combined with other scalers) |
| Time | Cron |
| Cloud Native | ArgoCD, KEDA HTTP Add-on, Kubernetes Event-driven Jobs |
| Cloud-Specific | Azure Event Hub, Azure Log Analytics, AWS CloudWatch, GCP Stackdriver |
Directory Structure¶
34-Keda/
βββ README.md # This file
βββ scripts/
β βββ install.sh # Install KEDA via Helm
β βββ demo.sh # Full automated demo
βββ manifests/
βββ 00-namespace.yaml # Namespace for demo workloads
βββ 01-demo-deployment.yaml # A simple nginx deployment to scale
βββ 02-scaled-object-cpu.yaml # ScaledObject: CPU-based scaling
βββ 03-scaled-object-cron.yaml # ScaledObject: Cron-based scheduling
βββ 04-redis-stack.yaml # Redis deployment for queue demo
βββ 05-scaled-object-redis.yaml # ScaledObject: Redis List scaler
βββ 06-trigger-auth.yaml # TriggerAuthentication with Secret
βββ 07-scaled-object-redis-auth.yaml # ScaledObject with auth
βββ 08-prometheus-scaler.yaml # ScaledObject: Prometheus scaler
βββ 09-scaled-job.yaml # ScaledJob: batch job per queue message
Prerequisites¶
- Kubernetes cluster (v1.24+)
kubectlconfigured to access your clusterHelm 3.xinstalled
Installation¶
Part 01 - Install KEDA via Helm¶
Helm is the recommended installation method for KEDA.
01. Add the KEDA Helm repository¶
helm repo add kedacore https://kedacore.github.io/charts
helm repo update kedacore
# Confirm available charts
helm search repo kedacore/keda
Expected output:
02. Install KEDA¶
03. Verify the installation¶
Expected output (all Running):
NAME READY STATUS RESTARTS
keda-admission-webhooks-xxxx 1/1 Running 0
keda-operator-xxxx 1/1 Running 0
keda-operator-metrics-apiserver-xxxx 1/1 Running 0
04. Verify KEDA CRDs are registered¶
Expected output:
clustertriggerauthentications.keda.sh
scaledjobs.keda.sh
scaledobjects.keda.sh
triggerauthentications.keda.sh
05. Verify the metrics API is available¶
Expected:
Part 02 - Install KEDA via kubectl (Alternative)¶
# Install KEDA using the official release manifest
kubectl apply --server-side \
-f https://github.com/kedacore/keda/releases/latest/download/keda-2.x.x.yaml
Note
Replace 2.x.x with the latest KEDA version from github.com/kedacore/keda/releases.
Core Concepts & Labs¶
Part 03 - Your First ScaledObject (CPU Scaler)¶
The CPU scaler is the simplest way to start with KEDA - it works like HPA but lets you combine it with other KEDA scalers.
Understanding the ScaledObject¶
A ScaledObject has three key sections:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-scaler
spec:
# 1. What to scale
scaleTargetRef:
name: my-deployment # Must match a Deployment/StatefulSet name
# 2. Scaling bounds
minReplicaCount: 1 # 0 = scale to zero
maxReplicaCount: 10
# 3. When to scale (triggers)
triggers:
- type: cpu # Scaler type
metadata:
type: Utilization # AverageValue or Utilization
value: "60" # Scale when CPU > 60%
Lab: Deploy a workload and scale it on CPU¶
Step 01 - Create the namespace and a demo deployment:
Verify:
Step 02 - Apply the CPU ScaledObject:
Step 03 - Verify KEDA created an HPA:
Expected:
Step 04 - Generate CPU load and watch scaling:
# Terminal 1: Watch pods
kubectl get pods -n keda-demo -w
# Terminal 2: Generate CPU load
kubectl run -it --rm load-generator \
--image=busybox \
--namespace=keda-demo \
--restart=Never \
-- /bin/sh -c "while true; do wget -q -O- http://nginx-demo:80; done"
Step 05 - Inspect the ScaledObject status:
Part 04 - Cron Scaler (Scheduled Scaling)¶
The Cron scaler lets you define time windows with specific replica counts. This is ideal for predictable traffic patterns - e.g., pre-warm your API servers every weekday morning.
ScaledObject with Cron Trigger¶
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: nginx-cron-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1 # Night/off-hours minimum
maxReplicaCount: 10
triggers:
- type: cron
metadata:
timezone: "Asia/Jerusalem" # Any valid IANA timezone
start: "0 8 * * 1-5" # Weekdays at 08:00
end: "0 18 * * 1-5" # Weekdays at 18:00
desiredReplicas: "5" # Scale to 5 during business hours
Multiple Cron Triggers¶
You can combine multiple cron triggers for different time windows:
triggers:
# Business hours: Mon-Fri, 08:00-18:00 β 5 replicas
- type: cron
metadata:
timezone: "Asia/Jerusalem"
start: "0 8 * * 1-5"
end: "0 18 * * 1-5"
desiredReplicas: "5"
# Lunch peak: Mon-Fri, 12:00-14:00 β 8 replicas
- type: cron
metadata:
timezone: "Asia/Jerusalem"
start: "0 12 * * 1-5"
end: "0 14 * * 1-5"
desiredReplicas: "8"
# Weekend reduced: Sat-Sun, 10:00-16:00 β 2 replicas
- type: cron
metadata:
timezone: "Asia/Jerusalem"
start: "0 10 * * 6-7"
end: "0 16 * * 6-7"
desiredReplicas: "2"
How multiple triggers work
When multiple triggers are active at the same time, KEDA uses the maximum desired replica count across all active triggers.
Lab: Apply the Cron ScaledObject¶
kubectl apply -f manifests/03-scaled-object-cron.yaml
# Check the current replica count
kubectl get scaledobject nginx-cron-scaler -n keda-demo
# Inspect the details including the active trigger
kubectl describe scaledobject nginx-cron-scaler -n keda-demo
Part 05 - Scale to Zero with Redis Queue Scaler¶
The Redis List scaler monitors a Redis list length and scales the consumer Deployment up (or from 0) when there are items in the queue - and back down to zero when the queue is empty.
This is the classic “worker pool” autoscaling pattern:
graph LR
producer["Producer\n(pushes jobs to queue)"] --> redis_list["Redis List\n(jobs:queue)"]
redis_list --> keda_op["KEDA Operator\n(polls list length)"]
keda_op -- "length > 0 β scale up" --> workers["Worker Pods\n(0 β N replicas)"]
keda_op -- "length == 0 β scale to zero" --> zero["0 Pods\n(cost savings)"]
workers -- "LPOP jobs" --> redis_list
Step 01 - Deploy Redis¶
kubectl apply -f manifests/04-redis-stack.yaml
# Wait for Redis to be ready
kubectl rollout status deployment/redis -n keda-demo
Step 02 - Deploy a Worker Deployment (starts at 0 replicas)¶
The worker deployment starts at 0 replicas - KEDA will scale it up when jobs arrive:
# Part of manifests/04-redis-stack.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-worker
namespace: keda-demo
spec:
replicas: 0 # Start at zero - KEDA controls this
selector:
matchLabels:
app: redis-worker
template:
metadata:
labels:
app: redis-worker
spec:
containers:
- name: worker
image: redis:7-alpine
# Simulates a worker: pops one job, sleeps 2s, repeat
command: ["/bin/sh", "-c"]
args:
- |
while true; do
JOB=$(redis-cli -h redis LPOP jobs:queue)
if [ -n "$JOB" ]; then
echo "Processing: $JOB"
sleep 2
else
sleep 1
fi
done
Step 03 - Apply the Redis ScaledObject¶
The ScaledObject:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-worker-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: redis-worker
minReplicaCount: 0 # Scale to ZERO when queue is empty
maxReplicaCount: 20
cooldownPeriod: 30 # Wait 30s after last job before scaling to zero
pollingInterval: 5 # Check every 5 seconds
triggers:
- type: redis
metadata:
address: redis:6379 # Redis host:port (within cluster)
listName: jobs:queue # The Redis list to monitor
listLength: "5" # One replica per 5 items in the queue
Step 04 - Verify scale-to-zero¶
# Should show 0 pods (no jobs in queue yet)
kubectl get pods -n keda-demo -l app=redis-worker
kubectl get scaledobject redis-worker-scaler -n keda-demo
Step 05 - Enqueue jobs and watch scale-up¶
# Terminal 1: Watch pods
kubectl get pods -n keda-demo -l app=redis-worker -w
# Terminal 2: Push 50 jobs to the queue
kubectl exec -it deployment/redis -n keda-demo -- \
redis-cli RPUSH jobs:queue \
job-1 job-2 job-3 job-4 job-5 \
job-6 job-7 job-8 job-9 job-10 \
job-11 job-12 job-13 job-14 job-15 \
job-16 job-17 job-18 job-19 job-20 \
job-21 job-22 job-23 job-24 job-25 \
job-26 job-27 job-28 job-29 job-30 \
job-31 job-32 job-33 job-34 job-35 \
job-36 job-37 job-38 job-39 job-40 \
job-41 job-42 job-43 job-44 job-45 \
job-46 job-47 job-48 job-49 job-50
# Check queue length
kubectl exec -it deployment/redis -n keda-demo -- redis-cli LLEN jobs:queue
Observe the events: 1. KEDA detects 50 jobs in queue (50 / 5 = 10 replicas desired) 2. Pods scale up from 0 β 10 3. Workers consume the jobs 4. Queue drains β worker pods scale back down to 0
Part 06 - TriggerAuthentication¶
Many scalers require credentials to connect to external services (password, token, connection string). TriggerAuthentication prevents putting secrets directly in the ScaledObject.
Creating a TriggerAuthentication¶
graph LR
secret["Kubernetes Secret\n(redis-auth-secret)"] --> ta["TriggerAuthentication\n(redis-auth)"]
ta --> so["ScaledObject"]
so --> keda_op["KEDA Operator"]
keda_op -- "Reads credentials\nfrom Secret" --> redis_ext["Redis with\npassword auth"]
Step 01 - Create a Secret:
kubectl create secret generic redis-auth-secret \
--namespace keda-demo \
--from-literal=redis-password='super-secret-password'
Step 02 - Create the TriggerAuthentication (references the Secret):
# manifests/06-trigger-auth.yaml
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: redis-auth
namespace: keda-demo
spec:
secretTargetRef:
- parameter: password # The scaler parameter this maps to
name: redis-auth-secret # Kubernetes Secret name
key: redis-password # Key within the Secret
Step 03 - Reference it in the ScaledObject:
# manifests/07-scaled-object-redis-auth.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-auth-worker-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: redis-worker
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: redis
authenticationRef:
name: redis-auth # Reference to TriggerAuthentication
metadata:
address: redis:6379
listName: jobs:queue
listLength: "5"
Apply:
kubectl apply -f manifests/06-trigger-auth.yaml
kubectl apply -f manifests/07-scaled-object-redis-auth.yaml
ClusterTriggerAuthentication (Cluster-Wide)¶
For credentials used across multiple namespaces:
apiVersion: keda.sh/v1alpha1
kind: ClusterTriggerAuthentication
metadata:
name: global-redis-auth # No namespace needed
spec:
secretTargetRef:
- parameter: password
name: redis-auth-secret # Secret must exist in the KEDA namespace
key: redis-password
Reference with kind:
Part 07 - Prometheus Scaler¶
The Prometheus scaler lets you scale based on any Prometheus metric - custom application metrics, business metrics, or infrastructure metrics.
# manifests/08-prometheus-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
# Prometheus server URL (in-cluster)
serverAddress: http://prometheus-server.monitoring.svc:9090
# The PromQL query to evaluate
# This example scales on HTTP request rate
query: |
sum(rate(http_requests_total{namespace="keda-demo"}[1m]))
# Scale threshold: add one replica per 100 req/sec
threshold: "100"
# Optional: activation threshold (below this = stay at minReplicaCount)
activationThreshold: "10"
PromQL Tips for KEDA
- The query must return a single scalar value
- Use
activationThresholdto prevent scaling when traffic is very low - KEDA uses the formula:
desiredReplicas = ceil(metricValue / threshold)
Common Prometheus Scaling Patterns¶
# Scale on response latency (p95 > 500ms)
query: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket{
namespace="keda-demo"
}[2m])) by (le)
) * 1000
# Scale on queue depth from application metric
query: |
myapp_queue_depth{namespace="keda-demo"}
# Scale on active WebSocket connections
query: |
sum(websocket_active_connections{namespace="keda-demo"})
Part 08 - ScaledJob (Batch Processing)¶
ScaledJob is designed for batch workloads where each event should be processed by its own short-lived Kubernetes Job (not a long-running pod).
Use cases: - Video/image transcoding (one Job per file) - Report generation (one Job per report request) - ML batch inference (one Job per data chunk)
ScaledObject vs ScaledJob¶
| Feature | ScaledObject | ScaledJob |
|---|---|---|
| Target | Deployment / StatefulSet | Kubernetes Job |
| Scaling model | Adjust replica count | Create new Jobs per event |
| Idle behavior | Scale to zero replicas | No Jobs running |
| Best for | Long-running workers | Short-lived batch tasks |
| Parallelism | All pods share the workload | Each Job handles its own event |
ScaledJob Example¶
# manifests/09-scaled-job.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: redis-batch-job
namespace: keda-demo
spec:
jobTargetRef:
# This Job template is instantiated for each batch of events
parallelism: 1
completions: 1
backoffLimit: 2
template:
spec:
restartPolicy: Never
containers:
- name: batch-processor
image: redis:7-alpine
command: ["/bin/sh", "-c"]
args:
- |
echo "Batch job started"
# Pop and process up to 5 items from the queue
for i in $(seq 1 5); do
JOB=$(redis-cli -h redis LPOP batch:queue)
if [ -n "$JOB" ]; then
echo "Processing batch item: $JOB"
sleep 1
fi
done
echo "Batch job done"
# Scaling configuration
minReplicaCount: 0 # No Jobs when queue is empty
maxReplicaCount: 50 # At most 50 parallel Jobs
pollingInterval: 10
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
# Scaling strategy
scalingStrategy:
strategy: "default" # "default", "custom", or "accurate"
# customScalingQueueLengthDeduction: 0
# customScalingRunningJobPercentage: "0.5"
triggers:
- type: redis
metadata:
address: redis:6379
listName: batch:queue
listLength: "5" # One Job per 5 items
Push items and watch Jobs¶
# Push 25 items to the batch queue
kubectl exec -it deployment/redis -n keda-demo -- \
redis-cli RPUSH batch:queue \
batch-1 batch-2 batch-3 batch-4 batch-5 \
batch-6 batch-7 batch-8 batch-9 batch-10 \
batch-11 batch-12 batch-13 batch-14 batch-15 \
batch-16 batch-17 batch-18 batch-19 batch-20 \
batch-21 batch-22 batch-23 batch-24 batch-25
# Watch Jobs being created
kubectl get jobs -n keda-demo -w
# Watch the ScaledJob
kubectl get scaledjob -n keda-demo
Part 09 - Scaling Behavior Tuning¶
KEDA inherits HPA’s scaling behavior configuration, giving you fine-grained control over how fast pods scale up and down.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: tuned-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 20
# Advanced HPA scaling behavior tuning
advanced:
# How quickly to scale UP
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # React immediately to scale up
policies:
- type: Pods
value: 4 # Add at most 4 pods per period
periodSeconds: 15
- type: Percent
value: 100 # Or double the pod count
periodSeconds: 15
selectPolicy: Max # Use whichever adds more pods
scaleDown:
stabilizationWindowSeconds: 120 # Wait 2 minutes before scaling down
policies:
- type: Pods
value: 2 # Remove at most 2 pods per period
periodSeconds: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-server.monitoring.svc:9090
query: sum(rate(http_requests_total{namespace="keda-demo"}[1m]))
threshold: "100"
Common Tuning Patterns¶
| Pattern | Config | Use Case |
|---|---|---|
| Aggressive scale-up, slow scale-down | scaleUp.stabilizationWindowSeconds: 0, scaleDown.stabilizationWindowSeconds: 300 |
Spiky traffic - respond fast, avoid flapping |
| Gradual scale-up | scaleUp.policies: [{type: Pods, value: 2, periodSeconds: 60}] |
Expensive pods, avoid overwhelming downstreams |
| No downscale | scaleDown: {selectPolicy: Disabled} |
Stateful workloads, long-lived connections |
| Fast scale-down | scaleDown.stabilizationWindowSeconds: 0 |
Short-lived jobs, cost optimization |
Part 10 - Kafka Scaler¶
The Kafka scaler scales Consumers based on consumer group lag - the number of messages in a topic that haven’t been processed yet.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: kafka-consumer
minReplicaCount: 0
maxReplicaCount: 20
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-broker:9092
consumerGroup: my-consumer-group # The consumer group to monitor
topic: my-topic # The topic to watch
lagThreshold: "10" # One replica per 10 unprocessed messages
offsetResetPolicy: latest # "latest" or "earliest"
Kafka with SASL/TLS Authentication¶
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: kafka-auth
namespace: keda-demo
spec:
secretTargetRef:
- parameter: sasl # "plaintext" | "scram_sha256" | "scram_sha512"
name: kafka-credentials
key: sasl-type
- parameter: username
name: kafka-credentials
key: username
- parameter: password
name: kafka-credentials
key: password
- parameter: tls
name: kafka-credentials
key: tls-enabled # "enable" | "disable"
Part 11 - HTTP Scaler (KEDA HTTP Add-on)¶
The HTTP scaler requires installing the separate KEDA HTTP Add-on. It intercepts HTTP traffic and scales the target service based on request rate (including scale to zero).
Install the HTTP Add-on¶
HTTPScaledObject¶
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: nginx-http-scaler
namespace: keda-demo
spec:
hosts:
- nginx-demo.keda-demo.svc # The Kubernetes service hostname
pathPrefixes:
- / # Scale on all paths (optional filter)
scaledownPeriod: 300 # Scale to zero after 5 minutes of no traffic
scaleTargetRef:
deployment: nginx-demo
service: nginx-demo
port: 80
replicas:
min: 0 # Scale to zero when no HTTP traffic
max: 10
scalingMetric:
requestRate:
targetValue: 100 # One replica per 100 req/sec
granularity: 1s
window: 1m
Part 12 - Combining Multiple Triggers¶
KEDA allows multiple triggers in a single ScaledObject. The scaling decision uses the maximum desired replica count across all active triggers.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: multi-trigger-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 20
triggers:
# Trigger 1: CPU-based (baseline HPA-like behavior)
- type: cpu
metadata:
type: Utilization
value: "60"
# Trigger 2: Scale during business hours
- type: cron
metadata:
timezone: "UTC"
start: "0 8 * * 1-5"
end: "0 18 * * 1-5"
desiredReplicas: "5"
# Trigger 3: Queue depth
- type: redis
metadata:
address: redis:6379
listName: jobs:queue
listLength: "10"
Multiple Trigger Evaluation
KEDA evaluates ALL triggers simultaneously and scales to whichever trigger demands the most replicas. If CPU wants 3, cron wants 5, and Redis wants 8 - KEDA scales to 8.
Part 13 - KEDA with ArgoCD (GitOps)¶
Managing ScaledObject and TriggerAuthentication through ArgoCD brings the GitOps benefits of version control, drift detection, and automatic reconciliation to your autoscaling configs.
ArgoCD Application for KEDA Resources¶
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: keda-autoscaling
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/nirgeier/KubernetesLabs.git
targetRevision: HEAD
path: Labs/34-Keda/manifests # All ScaledObject manifests
destination:
server: https://kubernetes.default.svc
namespace: keda-demo
syncPolicy:
automated:
prune: true
selfHeal: true # ArgoCD reverts manual ScaledObject changes
syncOptions:
- CreateNamespace=true
When you push a new or updated ScaledObject manifest to Git, ArgoCD automatically applies it and KEDA starts the new scaling behavior - no manual kubectl needed.
Workflow¶
sequenceDiagram
participant Dev as Developer
participant Git as Git Repository
participant ArgoCD as ArgoCD
participant K8s as Kubernetes
participant KEDA as KEDA Operator
Dev->>Git: Push updated ScaledObject YAML
ArgoCD->>Git: Polls for changes (or webhook)
Git-->>ArgoCD: New ScaledObject detected
ArgoCD->>K8s: Apply ScaledObject
K8s->>KEDA: KEDA Operator notified
KEDA->>K8s: Creates/updates managed HPA
Note over KEDA,K8s: KEDA now scales based on new trigger config
Part 14 - Monitoring KEDA¶
KEDA Metrics (Prometheus)¶
KEDA exposes its own metrics that you can scrape with Prometheus:
# Check KEDA metrics endpoint
kubectl port-forward svc/keda-operator-metrics-apiserver -n keda 8080:8080 &
curl http://localhost:8080/metrics
Key metrics:
| Metric | Description |
|---|---|
keda_scaler_active |
Whether a scaler is currently active (1=active, 0=inactive) |
keda_scaler_metrics_value |
The current metric value from a scaler |
keda_scaler_errors_total |
Number of errors encountered by a scaler |
keda_scaled_object_paused |
Whether a ScaledObject is paused |
keda_resource_totals |
Number of KEDA CRD resources |
ServiceMonitor for Prometheus Operator¶
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: keda-metrics
namespace: keda
labels:
release: prometheus # Must match your Prometheus release label
spec:
selector:
matchLabels:
app: keda-operator-metrics-apiserver
endpoints:
- port: metrics
interval: 30s
path: /metrics
Useful Grafana Dashboard¶
Import KEDA community dashboard from Grafana.com: Dashboard ID 16543
# Quick verification with kubectl
kubectl get scaledobjects -A
kubectl get scaledjobs -A
# Describe for events and conditions
kubectl describe scaledobject <name> -n <namespace>
# Check KEDA operator logs
kubectl logs -n keda -l app=keda-operator --tail=50
Part 15 - Pause and Resume Scaling¶
Sometimes you need to temporarily stop KEDA from scaling (e.g., during maintenance):
# Pause a ScaledObject (KEDA stops reconciling, current replicas stay)
kubectl annotate scaledobject nginx-cron-scaler \
-n keda-demo \
autoscaling.keda.sh/paused-replicas="2"
# Resume (delete the annotation)
kubectl annotate scaledobject nginx-cron-scaler \
-n keda-demo \
autoscaling.keda.sh/paused-replicas-
# Pause all scaling on a ScaledObject
kubectl annotate scaledobject nginx-cron-scaler \
-n keda-demo \
autoscaling.keda.sh/paused=true
# Resume
kubectl annotate scaledobject nginx-cron-scaler \
-n keda-demo \
autoscaling.keda.sh/paused-
Part 16 - Fallback Configuration¶
Fallback allows KEDA to use a safe replica value if the scaler fails to query the metric source:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: resilient-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 10
# Fallback: if the metric source is unavailable, use these settings
fallback:
failureThreshold: 3 # Fail 3 times before using fallback replicas
replicas: 3 # Fallback to 3 replicas when metric unavailable
triggers:
- type: redis
metadata:
address: redis:6379
listName: jobs:queue
listLength: "5"
Part 17 - KEDA CLI (kubectl-keda Plugin)¶
The keda plugin for kubectl simplifies common KEDA operations:
Install¶
# macOS via Homebrew
brew tap kedacore/keda
brew install keda
# Or via Krew (kubectl plugin manager)
kubectl krew install keda
Common Commands¶
# List all ScaledObjects across all namespaces
kubectl keda list scaledobjects -A
# List ScaledJobs
kubectl keda list scaledjobs -A
# Check metric values for a ScaledObject
kubectl keda get scaledobject nginx-cron-scaler -n keda-demo
# Show events for a ScaledObject
kubectl keda describe scaledobject nginx-cron-scaler -n keda-demo
# Pause / resume a ScaledObject
kubectl keda pause scaledobject nginx-cron-scaler -n keda-demo
kubectl keda resume scaledobject nginx-cron-scaler -n keda-demo
Part 18 - Troubleshooting¶
ScaledObject Not Scaling¶
# 1. Check ScaledObject status and conditions
kubectl describe scaledobject <name> -n <namespace>
# Look for conditions like:
# Ready: True/False
# Active: True/False
# 2. Check KEDA operator logs
kubectl logs -n keda -l app=keda-operator --tail=100
# 3. Check the HPA managed by KEDA
kubectl get hpa -n <namespace>
kubectl describe hpa keda-hpa-<name> -n <namespace>
# 4. Check if metric is being received
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Scale to Zero Not Working¶
# Verify minReplicaCount is 0 in ScaledObject
kubectl get scaledobject <name> -n <namespace> -o yaml | grep minReplicaCount
# Check the cooldownPeriod hasn't passed yet
kubectl describe scaledobject <name> -n <namespace> | grep -A5 "Conditions"
# Verify the metric/queue is truly empty
kubectl exec -it deployment/redis -n keda-demo -- redis-cli LLEN jobs:queue
Metric Source Connectivity Issues¶
# Check KEDA can reach the metric source
kubectl run debug-pod --image=busybox -n keda -it --rm --restart=Never \
-- sh -c "nc -zv redis.keda-demo.svc.cluster.local 6379"
# Verify TriggerAuthentication is correct
kubectl describe triggerauthentication <name> -n <namespace>
# Check if secrets referenced in TriggerAuthentication exist
kubectl get secret <secret-name> -n <namespace>
KEDA Webhook Errors¶
# Check admission webhook
kubectl get validatingwebhookconfigurations | grep keda
kubectl describe validatingwebhookconfiguration keda-admission
# Restart KEDA webhooks
kubectl rollout restart deployment/keda-admission-webhooks -n keda
Common KEDA Cheatsheet¶
# --- Installation ---
helm repo add kedacore https://kedacore.github.io/charts
helm upgrade --install keda kedacore/keda --namespace keda --create-namespace --wait
# --- Inspect ---
kubectl get scaledobjects -A # List all ScaledObjects
kubectl get scaledjobs -A # List all ScaledJobs
kubectl get triggerauthentications -A # List TriggerAuths
kubectl describe scaledobject <name> -n <ns> # Full details + events
kubectl get hpa -n <ns> # KEDA-managed HPAs
# --- Troubleshoot ---
kubectl logs -n keda -l app=keda-operator --tail=100 # Operator logs
kubectl logs -n keda -l app=keda-operator-metrics-apiserver --tail=50
# --- Pause / Resume ---
kubectl annotate scaledobject <name> -n <ns> autoscaling.keda.sh/paused=true
kubectl annotate scaledobject <name> -n <ns> autoscaling.keda.sh/paused-
# --- ScaledObject quick template ---
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-scaler
namespace: default
spec:
scaleTargetRef:
name: my-deployment
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: redis
metadata:
address: redis:6379
listName: my:queue
listLength: "5"
EOF
Exercises¶
The following exercises will test your understanding of KEDA concepts. Try to solve each exercise on your own before revealing the solution.
01. Scale a Deployment Based on a Custom Redis Key¶
Create a ScaledObject that monitors a Redis Sorted Set score instead of a list length. Use the redis scaler with listName pointing to a different key, and scale from 0 to 5 replicas.
Scenario:¶
β¦ Your application uses a Redis sorted set for priority-based job queues. β¦ You want workers to scale up when high-priority jobs are enqueued.
Hint: Create a new Redis list key and a corresponding ScaledObject with minReplicaCount: 0.
Solution
# 1. Create a ScaledObject for a different queue
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: priority-worker-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: redis-worker
minReplicaCount: 0
maxReplicaCount: 5
cooldownPeriod: 30
pollingInterval: 5
triggers:
- type: redis
metadata:
address: redis:6379
listName: priority:queue
listLength: "3"
EOF
# 2. Push items to the priority queue
kubectl exec -it deployment/redis -n keda-demo -- \
redis-cli RPUSH priority:queue job-a job-b job-c job-d job-e job-f
# 3. Watch pods scale up
kubectl get pods -n keda-demo -l app=redis-worker -w
# 4. Verify ScaledObject
kubectl get scaledobject priority-worker-scaler -n keda-demo
02. Combine Cron and CPU Triggers¶
Create a ScaledObject that uses both a Cron trigger (scale to 3 during business hours) and a CPU trigger (scale beyond 3 when CPU exceeds 70%).
Scenario:¶
β¦ Your API needs a baseline of 3 pods during work hours but should burst higher under load. β¦ Outside business hours, the minimum can drop to 1.
Hint: Use multiple triggers in a single ScaledObject. KEDA uses the maximum across active triggers.
Solution
# Apply this ScaledObject
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: hybrid-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: cron
metadata:
timezone: "UTC"
start: "0 8 * * 1-5"
end: "0 18 * * 1-5"
desiredReplicas: "3"
- type: cpu
metadata:
type: Utilization
value: "70"
03. Use TriggerAuthentication with a Kubernetes Secret¶
Create a TriggerAuthentication that references a Kubernetes Secret containing a Redis password, then create a ScaledObject that uses it.
Scenario:¶
β¦ Your production Redis requires authentication. β¦ You need to keep credentials out of the ScaledObject manifest.
Hint: Create a Secret, then a TriggerAuthentication referencing it, then a ScaledObject with authenticationRef.
Solution
# 1. Create the Secret
kubectl create secret generic my-redis-secret \
--namespace keda-demo \
--from-literal=redis-password='my-secure-password'
# 2. Create the TriggerAuthentication
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: my-redis-auth
namespace: keda-demo
spec:
secretTargetRef:
- parameter: password
name: my-redis-secret
key: redis-password
EOF
# 3. Create the ScaledObject with authenticationRef
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: auth-redis-scaler
namespace: keda-demo
spec:
scaleTargetRef:
name: redis-worker
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: redis
authenticationRef:
name: my-redis-auth
metadata:
address: redis:6379
listName: secure:queue
listLength: "5"
EOF
# 4. Verify
kubectl get triggerauthentication -n keda-demo
kubectl describe scaledobject auth-redis-scaler -n keda-demo
04. Create a ScaledJob for Batch Processing¶
Create a ScaledJob that spawns one Kubernetes Job for every 3 items in a Redis list, with a maximum of 10 concurrent Jobs.
Scenario:¶
β¦ Your data pipeline receives files for processing. Each Job should handle a small batch. β¦ When the queue is empty, no Jobs should be running.
Hint: Use kind: ScaledJob with jobTargetRef and the redis trigger.
Solution
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: batch-processor
namespace: keda-demo
spec:
jobTargetRef:
parallelism: 1
completions: 1
backoffLimit: 2
template:
spec:
restartPolicy: Never
containers:
- name: processor
image: redis:7-alpine
command: ["/bin/sh", "-c"]
args:
- |
for i in $(seq 1 3); do
JOB=$(redis-cli -h redis LPOP processing:queue)
if [ -n "$JOB" ]; then
echo "Processing: $JOB"
sleep 2
fi
done
echo "Batch complete"
minReplicaCount: 0
maxReplicaCount: 10
pollingInterval: 10
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 3
triggers:
- type: redis
metadata:
address: redis:6379
listName: processing:queue
listLength: "3"
# Apply the ScaledJob
kubectl apply -f - <<EOF
# (paste the YAML above)
EOF
# Push items to trigger Job creation
kubectl exec -it deployment/redis -n keda-demo -- \
redis-cli RPUSH processing:queue file-1 file-2 file-3 file-4 file-5 file-6 file-7 file-8 file-9
# Watch Jobs being created
kubectl get jobs -n keda-demo -w
# Check the ScaledJob
kubectl get scaledjob -n keda-demo
05. Pause and Resume a ScaledObject¶
Pause an active ScaledObject at a fixed replica count, then resume normal scaling.
Scenario:¶
β¦ You need to perform maintenance on the metric source and want to keep a stable replica count. β¦ After maintenance, resume event-driven scaling.
Hint: Use the autoscaling.keda.sh/paused-replicas annotation.
Solution
# 1. Pause the ScaledObject at 3 replicas
kubectl annotate scaledobject redis-worker-scaler \
-n keda-demo \
autoscaling.keda.sh/paused-replicas="3"
# 2. Verify it's paused
kubectl get scaledobject redis-worker-scaler -n keda-demo -o yaml | grep -A2 annotations
# 3. Check that replicas stay at 3 regardless of queue depth
kubectl get deployment redis-worker -n keda-demo
# 4. Resume normal scaling
kubectl annotate scaledobject redis-worker-scaler \
-n keda-demo \
autoscaling.keda.sh/paused-replicas-
# 5. Verify scaling resumes
kubectl describe scaledobject redis-worker-scaler -n keda-demo
Cleanup¶
# Remove all KEDA demo resources
kubectl delete namespace keda-demo
# Uninstall KEDA
helm uninstall keda --namespace keda
kubectl delete namespace keda
# Remove KEDA CRDs
kubectl delete crd \
scaledobjects.keda.sh \
scaledjobs.keda.sh \
triggerauthentications.keda.sh \
clustertriggerauthentications.keda.sh
Summary¶
| Concept | Key Takeaway |
|---|---|
| ScaledObject | Links Deployments to any event source; KEDA manages an HPA for you |
| ScaledJob | Creates a new Job per event batch; ideal for batch processing |
| TriggerAuthentication | Externalizes credentials from ScaledObject specs |
| Scale to Zero | Set minReplicaCount: 0 for complete cost savings when idle |
| Multiple Triggers | Combine triggers; KEDA uses whichever demands the most replicas |
| Fallback | Define safe replica counts when the metric source is unreachable |
| Pause | Temporarily halt KEDA scaling without removing resources |
| GitOps | Manage ScaledObjects via ArgoCD for full GitOps autoscaling |
Next Steps¶
- Explore the full KEDA Scalers catalog - 60+ event sources including AWS SQS, GCP Pub/Sub, and Azure Service Bus.
- Try the KEDA HTTP Add-on for HTTP-based scale-to-zero.
- Combine KEDA with ArgoCD (Lab 18) for GitOps-managed autoscaling configurations.
- Learn about Prometheus & Grafana (Lab 15) for monitoring KEDA metrics.
- Explore the KEDA community Grafana dashboard for visualizing scaling behavior.
- Practice KEDA tasks in the Kubernetes KEDA Tasks section.
crictl - Container Runtime Interface CLI¶
- Welcome to the
crictlhands-on lab! In this tutorial, you will learn how to usecrictl, the command-line interface for CRI-compatible container runtimes. crictlis an essential debugging and inspection tool that operates at the container runtime level, giving you visibility into what is happening beneath the Kubernetes API layer.- Unlike
kubectl, which communicates with the Kubernetes API server,crictltalks directly to the container runtime (such ascontainerdorCRI-O) on a specific node.
Node-Level Tool
crictl runs directly on Kubernetes nodes, not from your local workstation.
All commands in this lab assume you have SSH access to a Kubernetes node (control plane or worker).
You must run these commands as root or with sudo privileges.
What will we learn?¶
- What the Container Runtime Interface (CRI) is and why it exists
- How
crictlfits into the Kubernetes architecture - How to install and configure
crictlon a Kubernetes node - How to list, inspect, and filter pods and containers at the runtime level
- How to view container logs directly from the runtime
- How to execute commands inside running containers
- How to manage container images at the runtime level
- How to monitor resource usage with runtime-level stats
- How to query runtime information for debugging
- How to manually create pod sandboxes and containers (advanced debugging)
- When to use
crictlversuskubectland other container CLI tools
Official Documentation & References¶
| Resource | Link |
|---|---|
| crictl Official Repository | github.com/kubernetes-sigs/cri-tools |
| CRI Specification | kubernetes.io/docs/concepts/architecture/cri |
| crictl User Guide | github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md |
| Kubernetes Container Runtimes | kubernetes.io/docs/setup/production-environment/container-runtimes |
| containerd Documentation | containerd.io/docs |
| CRI-O Documentation | cri-o.io |
| Debugging Kubernetes Nodes | kubernetes.io/docs/tasks/debug/debug-cluster/crictl |
| Migrating from Docker to containerd | kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim |
| OCI Runtime Specification | github.com/opencontainers/runtime-spec |
Introduction¶
What is CRI (Container Runtime Interface)?¶
- The Container Runtime Interface (CRI) is a plugin interface that allows the kubelet to use a wide variety of container runtimes without needing to recompile the kubelet itself.
- CRI defines a set of gRPC services that a container runtime must implement so that the kubelet can manage pods and containers on a node.
- Before CRI existed, Kubernetes was tightly coupled to Docker. CRI decoupled Kubernetes from any specific runtime, enabling alternatives like
containerdandCRI-O.
The CRI specification defines two main gRPC services:
| Service | Responsibility |
|---|---|
RuntimeService |
Manages pod sandboxes and containers: creating, starting, stopping, removing, and listing them. |
ImageService |
Manages container images: pulling, listing, inspecting, and removing images on the node. |
Why crictl Exists¶
crictl(pronounced “cry-cuttle”) is a CLI tool for CRI-compatible container runtimes.- It provides a way to inspect and debug the container runtime directly on a Kubernetes node.
- It was created as part of the
cri-toolsproject by the Kubernetes SIG Node team. crictlis the recommended replacement fordockerCLI commands when your cluster usescontainerdorCRI-Oinstead of Docker.
Common use cases for crictl:
- Debugging container issues that are not visible through
kubectl - Inspecting the state of containers and pods at the runtime level
- Viewing logs when the Kubernetes API server is unavailable
- Checking image availability on a specific node
- Investigating pod sandbox networking issues
- Monitoring resource usage at the runtime level
- Troubleshooting CrashLoopBackOff and ImagePullBackOff errors from the node perspective
crictl vs docker CLI vs nerdctl vs ctr¶
Since the removal of dockershim in Kubernetes 1.24, multiple CLI tools exist for interacting with container runtimes. Here is how they compare:
| Feature | crictl |
docker |
nerdctl |
ctr |
|---|---|---|---|---|
| Purpose | CRI debugging | Docker Engine management | containerd management (Docker-compatible) | Low-level containerd client |
| Target Runtime | Any CRI runtime | Docker Engine only | containerd only | containerd only |
| Kubernetes-aware | Yes (pods, sandboxes) | No | Yes (with nerdctl ps -a) | No |
| Pod support | Yes (native) | No | Limited | No |
| Image management | Pull, list, remove | Full (build, push, pull) | Full (build, push, pull) | Pull, list, remove |
| Container creation | Manual (debugging only) | Full lifecycle | Full lifecycle | Full lifecycle |
| Build images | No | Yes | Yes | No |
| Compose support | No | Yes (docker compose) | Yes (nerdctl compose) | No |
| Recommended for K8s | Yes (node debugging) | No (deprecated in K8s) | Yes (development) | No (too low-level) |
| Installed by default | Often (kubeadm nodes) | No (unless Docker runtime) | No | Yes (with containerd) |
Rule of Thumb
- Use
kubectlfor cluster-level operations (from your workstation). - Use
crictlfor node-level debugging (SSH into the node). - Use
nerdctlif you need a Docker-compatible CLI forcontainerd. - Use
ctronly for very low-level containerd operations.
CRI Architecture¶
The following diagram shows how crictl fits into the Kubernetes container runtime architecture:
graph TB
subgraph "User / Operator"
kubectl["kubectl<br/>(cluster-level)"]
crictl["crictl<br/>(node-level)"]
end
subgraph "Control Plane"
api["API Server"]
end
subgraph "Kubernetes Node"
kubelet["kubelet"]
cri_shim["CRI gRPC Interface<br/>(unix socket)"]
subgraph "Container Runtime"
containerd["containerd / CRI-O"]
oci["OCI Runtime<br/>(runc / crun / kata)"]
end
subgraph "Workloads"
pod1["Pod Sandbox 1"]
c1a["Container A"]
c1b["Container B"]
pod2["Pod Sandbox 2"]
c2a["Container C"]
end
end
kubectl --> api
api --> kubelet
kubelet -->|"CRI gRPC calls"| cri_shim
crictl -->|"CRI gRPC calls"| cri_shim
cri_shim --> containerd
containerd --> oci
oci --> pod1
oci --> pod2
pod1 --- c1a
pod1 --- c1b
pod2 --- c2a
style crictl fill:#f9a825,stroke:#f57f17,color:#000
style kubectl fill:#42a5f5,stroke:#1565c0,color:#000
style kubelet fill:#66bb6a,stroke:#2e7d32,color:#000
style containerd fill:#ab47bc,stroke:#6a1b9a,color:#fff
Key points from the architecture:
- kubectl communicates with the API Server over HTTPS (cluster-level).
- The API Server instructs the kubelet on each node.
- The kubelet communicates with the container runtime via CRI gRPC over a Unix socket.
- crictl connects to the same Unix socket, bypassing the API Server entirely.
- The container runtime (
containerdorCRI-O) delegates actual container execution to an OCI runtime likerunc.
When to Use crictl vs kubectl¶
| Scenario | Use kubectl |
Use crictl |
|---|---|---|
| Deploy, scale, or manage workloads | Yes | No |
| View pod logs (API server is healthy) | Yes | Optional |
| View pod logs (API server is down or unreachable) | No | Yes |
| Inspect container state on a specific node | Limited | Yes |
| Debug networking at the pod sandbox level | No | Yes |
| Check which images are cached on a node | No | Yes |
| Monitor per-container resource usage on a node | No | Yes |
| Investigate why a container keeps crashing | Partially | Yes (more detail) |
| Create test pods and containers | Yes | Yes (manual) |
| Manage cluster resources (Services, Ingress, etc.) | Yes | No |
Configuration File¶
crictl uses a configuration file at /etc/crictl.yaml to determine which runtime endpoint to connect to. This avoids having to pass the --runtime-endpoint flag with every command.
## /etc/crictl.yaml
## Configuration file for crictl
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
pull-image-on-create: false
disable-pull-on-run: false
| Field | Description |
|---|---|
runtime-endpoint |
The Unix socket path for the container runtime’s CRI service. |
image-endpoint |
The Unix socket path for the image service (often the same as runtime-endpoint). |
timeout |
Timeout in seconds for CRI gRPC calls. |
debug |
When true, enables verbose debug output for all commands. |
pull-image-on-create |
When true, automatically pulls the image when creating a container. |
disable-pull-on-run |
When true, disables automatic image pulling when running a container. |
Common runtime endpoint paths:
| Runtime | Socket Path |
|---|---|
containerd |
unix:///run/containerd/containerd.sock |
CRI-O |
unix:///var/run/crio/crio.sock |
Docker (via cri-dockerd) |
unix:///run/cri-dockerd.sock |
Prerequisites¶
Before starting this lab, ensure you have:
- A running Kubernetes cluster (single-node or multi-node)
- SSH access to at least one Kubernetes node (control plane or worker)
- Root or sudo privileges on the node
- A container runtime installed on the node (
containerdorCRI-O) - Basic familiarity with
kubectland Kubernetes concepts (pods, containers, namespaces) - Some workloads already running on the cluster (for inspection)
This Lab Requires Node Access
Unlike most Kubernetes labs where you run commands from your workstation using kubectl, this lab requires you to SSH into a Kubernetes node and run commands directly on it. If you are using a managed Kubernetes service (EKS, GKE, AKS), you will need to SSH into a worker node or use a node shell utility like kubectl debug node/<node-name> -it --image=ubuntu.
To SSH into a node (example):
## If you know the node IP address
ssh user@<node-ip>
## Using kubectl to get node IPs first
kubectl get nodes -o wide
## Alternative: use kubectl debug to get a shell on a node
## (requires Kubernetes 1.18+ with ephemeral containers enabled)
kubectl debug node/<node-name> -it --image=ubuntu
Lab¶
Step 01 - Install crictl¶
crictlis distributed as a standalone binary from thecri-toolsproject.- On many Kubernetes distributions (kubeadm, k3s, etc.),
crictlis already installed. Check first before installing.
Check if crictl is Already Installed¶
## Check if crictl is available on the node
which crictl
## If installed, check the version
crictl --version
## Expected output (example):
## crictl version v1.29.0
Install crictl (if not present)¶
## Set the desired version
VERSION="v1.29.0"
## Download the crictl tarball
curl -L "https://github.com/kubernetes-sigs/cri-tools/releases/download/${VERSION}/crictl-${VERSION}-linux-amd64.tar.gz" \
-o crictl-${VERSION}-linux-amd64.tar.gz
## Extract the binary to /usr/local/bin
sudo tar zxvf crictl-${VERSION}-linux-amd64.tar.gz -C /usr/local/bin
## Verify the installation
crictl --version
## Clean up the tarball
rm -f crictl-${VERSION}-linux-amd64.tar.gz
## Set the desired version
VERSION="v1.29.0"
## Download the crictl tarball for arm64
curl -L "https://github.com/kubernetes-sigs/cri-tools/releases/download/${VERSION}/crictl-${VERSION}-linux-arm64.tar.gz" \
-o crictl-${VERSION}-linux-arm64.tar.gz
## Extract the binary to /usr/local/bin
sudo tar zxvf crictl-${VERSION}-linux-arm64.tar.gz -C /usr/local/bin
## Verify the installation
crictl --version
## Clean up the tarball
rm -f crictl-${VERSION}-linux-arm64.tar.gz
Version Compatibility
It is recommended to use a crictl version that matches your Kubernetes minor version.
For example, use crictl v1.29.x with Kubernetes v1.29.x.
Check the compatibility matrix for details.
Step 02 - Configure crictl¶
- Before using
crictl, you need to configure it to connect to the correct container runtime socket. - Without configuration,
crictlwill attempt to auto-detect the runtime, but it is best to be explicit.
Identify Your Container Runtime¶
## Check which container runtime the kubelet is using
ps aux | grep kubelet | grep -- --container-runtime-endpoint
## Alternative: check the kubelet configuration
sudo cat /var/lib/kubelet/config.yaml | grep -i container
## For containerd, check if the socket exists
ls -la /run/containerd/containerd.sock
## For CRI-O, check if the socket exists
ls -la /var/run/crio/crio.sock
Create the Configuration File¶
Using the –runtime-endpoint Flag (Alternative)¶
## Instead of a config file, you can specify the endpoint per command
crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps
## Or set it as an environment variable
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/containerd/containerd.sock
## Now all crictl commands will use this endpoint
crictl ps
Enable Debug Mode (Optional)¶
## Temporarily enable debug mode for troubleshooting
## This shows the gRPC calls being made to the runtime
sudo tee /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: true
EOF
## Run a command with debug output
crictl ps
## Remember to disable debug mode when done
## (set debug: false in /etc/crictl.yaml)
Debug Mode
Enabling debug: true in /etc/crictl.yaml prints the raw gRPC requests and responses.
This is extremely useful when troubleshooting connectivity issues with the container runtime socket.
Test the Configuration¶
## Verify crictl can communicate with the runtime
crictl info
## Expected output: JSON with runtime information including
## version, storage driver, and runtime conditions
## Quick sanity check: list running containers
crictl ps
## List all pods managed by the runtime
crictl pods
Step 03 - Listing Pods with crictl pods¶
- The
crictl podscommand lists all pod sandboxes managed by the container runtime. - A pod sandbox is the runtime’s representation of a Kubernetes pod. It holds the shared Linux namespaces (network, IPC, PID) that containers within the pod share.
Basic Pod Listing¶
## List all pod sandboxes (running and stopped)
crictl pods
## Example output:
## POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
## a1b2c3d4e5f6 2 hours ago Ready nginx-7d456b8f9c-abcde default 0 (default)
## f6e5d4c3b2a1 3 hours ago Ready coredns-5dd5756b68-xyz12 kube-system 0 (default)
Filtering Pods¶
## Filter pods by name
crictl pods --name nginx
## Filter pods by namespace
crictl pods --namespace kube-system
## Filter pods by state (Ready or NotReady)
crictl pods --state Ready
## Filter pods that are not ready (stopped, failed, etc.)
crictl pods --state NotReady
## Filter pods by label
crictl pods --label app=nginx
## Filter by multiple labels
crictl pods --label app=nginx --label version=v1
## Combine multiple filters
crictl pods --namespace default --state Ready --label app=nginx
Verbose Pod Listing¶
## Show full pod IDs (not truncated)
crictl pods --no-trunc
## Show only pod IDs (useful for scripting)
crictl pods --quiet
## Show pods with additional information in verbose mode
crictl pods --verbose
## Output in JSON format for programmatic consumption
crictl pods -o json
## Output in YAML format
crictl pods -o yaml
## Output in table format (default)
crictl pods -o table
Listing Pods by Last N¶
## Show only the last 5 pods created
crictl pods --last 5
## Show the most recently created pod
crictl pods --last 1
Pod Sandbox vs Kubernetes Pod
Every Kubernetes pod corresponds to a pod sandbox in the container runtime.
The sandbox is created first (with its own “pause” container), and then the actual
application containers are started inside it. When you see a sandbox in crictl pods,
it maps 1:1 to a pod you would see in kubectl get pods.
Step 04 - Inspecting Pods with crictl inspectp¶
- The
crictl inspectpcommand provides detailed information about a specific pod sandbox. - This includes network configuration, DNS settings, Linux namespace paths, labels, annotations, and more.
Inspect a Pod¶
## First, get the pod ID you want to inspect
crictl pods
## Note the POD ID from the output (e.g., a1b2c3d4e5f6)
## Inspect the pod sandbox by its ID
crictl inspectp a1b2c3d4e5f6
## You can also use a partial ID (as long as it is unique)
crictl inspectp a1b2c3
Understanding the Inspect Output¶
The output is a JSON document with several important sections:
## Inspect a pod and pipe through jq for readability
crictl inspectp a1b2c3d4e5f6 | jq .
## Get the pod's network information
## This shows the pod IP address and network namespace
crictl inspectp a1b2c3d4e5f6 | jq '.status.network'
## Example output:
## {
## "additionalIps": [],
## "ip": "10.244.0.15"
## }
## Get the pod's Linux namespace paths
crictl inspectp a1b2c3d4e5f6 | jq '.info.runtimeSpec.linux.namespaces'
## Get the pod's labels
crictl inspectp a1b2c3d4e5f6 | jq '.status.labels'
## Get the pod's annotations
crictl inspectp a1b2c3d4e5f6 | jq '.status.annotations'
## Get the pod's DNS configuration
crictl inspectp a1b2c3d4e5f6 | jq '.info.runtimeSpec.linux.resources'
## Get the creation timestamp
crictl inspectp a1b2c3d4e5f6 | jq '.status.createdAt'
## Get the pod's state
crictl inspectp a1b2c3d4e5f6 | jq '.status.state'
Inspect Network Namespace¶
## Get the network namespace path for a pod
## This is useful for debugging networking issues with nsenter
NETNS=$(crictl inspectp a1b2c3d4e5f6 | jq -r '.info.runtimeSpec.linux.namespaces[] | select(.type=="network") | .path')
echo "Network namespace: $NETNS"
## Enter the pod's network namespace to debug networking
## (requires nsenter installed on the node)
sudo nsenter --net=$NETNS ip addr show
## Check the routes inside the pod's network namespace
sudo nsenter --net=$NETNS ip route show
## Check DNS resolution from inside the network namespace
sudo nsenter --net=$NETNS cat /etc/resolv.conf
Info Section Requires Runtime Support
The .info section in the crictl inspectp output requires the container runtime to
support the verbose inspection feature. Some runtimes may return an empty info section.
If this happens, the .status section will still contain essential information like
the pod ID, state, labels, and creation time.
Step 05 - Listing Containers with crictl ps¶
- The
crictl pscommand lists containers managed by the container runtime. - By default, it shows only running containers. Use the
-aflag to include all states.
Basic Container Listing¶
## List all running containers
crictl ps
## Example output:
## CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
## b1c2d3e4f5g6 nginx:1.25 2 hours ago Running nginx 0 a1b2c3d4e5f6 nginx-7d456b8f9c-abcde
## h7i8j9k0l1m2 coredns:1.11 3 hours ago Running coredns 0 f6e5d4c3b2a1 coredns-5dd5756b68-xyz12
## List ALL containers (including exited, created, unknown states)
crictl ps -a
## Example output showing exited containers:
## CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
## b1c2d3e4f5g6 nginx:1.25 2 hours ago Running nginx 0 a1b2c3d4e5f6 nginx-...
## n3o4p5q6r7s8 busybox:1.36 1 hour ago Exited init-db 0 t9u0v1w2x3y4 myapp-...
Filtering Containers¶
## Filter containers by name
crictl ps --name nginx
## Filter containers by state
crictl ps --state Running
crictl ps --state Exited
crictl ps --state Created
crictl ps --state Unknown
## Filter containers by image
crictl ps --image nginx
## Filter containers by label
crictl ps --label io.kubernetes.container.name=nginx
## Filter containers belonging to a specific pod
crictl ps --pod a1b2c3d4e5f6
## Combine filters: show exited containers for a specific pod
crictl ps -a --state Exited --pod a1b2c3d4e5f6
## Show only the last N containers
crictl ps -a --last 10
Output Formatting¶
## Show full IDs (not truncated)
crictl ps --no-trunc
## Show only container IDs (useful for scripting)
crictl ps --quiet
## JSON output for programmatic processing
crictl ps -o json
## YAML output
crictl ps -o yaml
## Show all containers with verbose output
crictl ps -a --verbose
Counting Containers by State¶
## Count running containers
crictl ps --quiet | wc -l
## Count all containers (including stopped)
crictl ps -a --quiet | wc -l
## Count exited containers only
crictl ps -a --state Exited --quiet | wc -l
## List all containers with their exit codes (exited containers)
crictl ps -a --state Exited -o json | jq '.containers[] | {name: .metadata.name, id: .id, exitCode: .state}'
Container States
Containers in CRI can be in one of four states:
- Created: The container has been created but not started.
- Running: The container is currently executing.
- Exited: The container has finished executing (with an exit code).
- Unknown: The container state cannot be determined.
Step 06 - Inspecting Containers with crictl inspect¶
- The
crictl inspectcommand provides detailed information about a specific container. - This includes the container’s full configuration, resource limits, mounts, environment variables, process info, and more.
Inspect a Container¶
## First, get the container ID
crictl ps
## Note the CONTAINER ID (e.g., b1c2d3e4f5g6)
## Inspect the container
crictl inspect b1c2d3e4f5g6
## Inspect with pretty-printed JSON (pipe through jq)
crictl inspect b1c2d3e4f5g6 | jq .
Extract Specific Information¶
## Get the container's image reference
crictl inspect b1c2d3e4f5g6 | jq '.status.image.image'
## Get the container's PID (process ID on the host)
crictl inspect b1c2d3e4f5g6 | jq '.info.pid'
## Get the container's resource limits (CPU and memory)
crictl inspect b1c2d3e4f5g6 | jq '.info.runtimeSpec.linux.resources'
## Get CPU limits specifically
crictl inspect b1c2d3e4f5g6 | jq '.info.runtimeSpec.linux.resources.cpu'
## Get memory limits specifically
crictl inspect b1c2d3e4f5g6 | jq '.info.runtimeSpec.linux.resources.memory'
## Get the container's environment variables
crictl inspect b1c2d3e4f5g6 | jq '.info.runtimeSpec.process.env'
## Get the container's mount points
crictl inspect b1c2d3e4f5g6 | jq '.info.runtimeSpec.mounts'
## Get the container's startup command and arguments
crictl inspect b1c2d3e4f5g6 | jq '.info.runtimeSpec.process.args'
## Get the container's working directory
crictl inspect b1c2d3e4f5g6 | jq '.info.runtimeSpec.process.cwd'
## Get the container's state and exit code (for exited containers)
crictl inspect b1c2d3e4f5g6 | jq '{state: .status.state, exitCode: .status.exitCode, reason: .status.reason}'
## Get the container's creation and start timestamps
crictl inspect b1c2d3e4f5g6 | jq '{created: .status.createdAt, started: .status.startedAt, finished: .status.finishedAt}'
## Get the container's log path
crictl inspect b1c2d3e4f5g6 | jq '.status.logPath'
Inspecting Exited Containers¶
## When debugging a CrashLoopBackOff, inspect the exited container
## to find the exit code and reason
## List exited containers
crictl ps -a --state Exited
## Inspect the exited container
crictl inspect n3o4p5q6r7s8 | jq '{
name: .status.metadata.name,
state: .status.state,
exitCode: .status.exitCode,
reason: .status.reason,
message: .status.message,
startedAt: .status.startedAt,
finishedAt: .status.finishedAt
}'
## Common exit codes:
## 0 = Success (normal termination)
## 1 = General error
## 126 = Command not executable
## 127 = Command not found
## 137 = Killed by SIGKILL (OOMKilled or manual kill)
## 139 = Segmentation fault (SIGSEGV)
## 143 = Killed by SIGTERM (graceful shutdown)
Understanding Exit Code 137
Exit code 137 (128 + 9 = SIGKILL) is one of the most common crash indicators.
It usually means the container was OOMKilled (ran out of memory) or was killed by
the kubelet because it exceeded its memory limit. Check the container’s memory limits
and actual usage with crictl inspect and crictl stats to confirm.
Step 07 - Viewing Container Logs with crictl logs¶
- The
crictl logscommand reads the stdout/stderr logs of a container. - It works similarly to
kubectl logsbut operates directly at the runtime level. - This is especially useful when the Kubernetes API server is unavailable.
Basic Log Viewing¶
## View logs for a running container
crictl logs b1c2d3e4f5g6
## View logs for an exited container (useful for crash debugging)
## First find the exited container ID
crictl ps -a --state Exited
## Then view its logs
crictl logs n3o4p5q6r7s8
Log Options¶
## Follow logs in real-time (like tail -f)
crictl logs --follow b1c2d3e4f5g6
## Show only the last N lines
crictl logs --tail 50 b1c2d3e4f5g6
## Show only the last 10 lines
crictl logs --tail 10 b1c2d3e4f5g6
## Show logs with timestamps
crictl logs --timestamps b1c2d3e4f5g6
## Example output with timestamps:
## 2024-01-15T10:30:45.123456789Z 172.17.0.1 - - [15/Jan/2024:10:30:45 +0000] "GET / HTTP/1.1" 200 615
## Show logs since a specific time (RFC3339 format)
crictl logs --since "2024-01-15T10:00:00Z" b1c2d3e4f5g6
## Show logs from the last 5 minutes
## (Calculate the timestamp 5 minutes ago)
crictl logs --since "$(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%SZ)" b1c2d3e4f5g6
## Show logs until a specific time
crictl logs --until "2024-01-15T11:00:00Z" b1c2d3e4f5g6
## Combine: follow logs with timestamps, showing only the last 20 lines
crictl logs --follow --timestamps --tail 20 b1c2d3e4f5g6
Viewing Logs of Previous Container Instances¶
## When a container restarts (CrashLoopBackOff), you may want to see
## logs from the previous instance
## List all containers (including exited) for a specific pod
crictl ps -a --pod a1b2c3d4e5f6
## The output shows ATTEMPT numbers:
## CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
## b1c2d3e4f5g6 nginx 1 min ago Running nginx 3 a1b2...
## x1y2z3a4b5c6 nginx 3 min ago Exited nginx 2 a1b2...
## d6e7f8g9h0i1 nginx 5 min ago Exited nginx 1 a1b2...
## View logs from the previous (exited) container instance
crictl logs x1y2z3a4b5c6
## View logs from the oldest instance
crictl logs d6e7f8g9h0i1
Log File Location
Container logs are stored on disk at a path determined by the container runtime.
You can find the log path with crictl inspect <container-id> | jq '.status.logPath'.
For containerd, logs are typically stored under /var/log/pods/.
Step 08 - Executing Commands Inside Containers with crictl exec¶
- The
crictl execcommand runs a command inside a running container. - It works similarly to
kubectl execbut at the runtime level.
Basic Command Execution¶
## Execute a single command in a container
crictl exec b1c2d3e4f5g6 ls /
## Execute a command with arguments
crictl exec b1c2d3e4f5g6 cat /etc/hostname
## Check the container's IP address
crictl exec b1c2d3e4f5g6 ip addr show
## View running processes inside the container
crictl exec b1c2d3e4f5g6 ps aux
## Check DNS resolution inside the container
crictl exec b1c2d3e4f5g6 cat /etc/resolv.conf
## Test network connectivity from inside the container
crictl exec b1c2d3e4f5g6 wget -qO- http://kubernetes.default.svc.cluster.local/healthz
Interactive Shell¶
## Open an interactive shell inside the container
## -i = interactive (keep stdin open)
## -t = allocate a pseudo-TTY
crictl exec -it b1c2d3e4f5g6 /bin/sh
## If the container has bash available
crictl exec -it b1c2d3e4f5g6 /bin/bash
## Once inside the container, you can run commands interactively:
## # ls /app
## # env
## # curl localhost:8080/health
## # exit
Debugging with exec¶
## Check if a specific port is listening inside the container
crictl exec b1c2d3e4f5g6 netstat -tlnp
## Check environment variables (useful for debugging misconfiguration)
crictl exec b1c2d3e4f5g6 env
## Check file system mounts inside the container
crictl exec b1c2d3e4f5g6 mount
## Read application configuration files
crictl exec b1c2d3e4f5g6 cat /etc/nginx/nginx.conf
## Check available disk space inside the container
crictl exec b1c2d3e4f5g6 df -h
## Test if another service is reachable from this container
crictl exec b1c2d3e4f5g6 ping -c 3 google.com
exec Only Works on Running Containers
You cannot use crictl exec on exited or stopped containers.
For debugging crashed containers, use crictl logs to view their output, or
crictl inspect to examine their exit code and state.
Step 09 - Image Management¶
crictlprovides commands to manage container images on the node.- This is useful for verifying that images are available, checking disk usage, and cleaning up unused images.
Listing Images¶
## List all images on the node
crictl images
## Example output:
## IMAGE TAG IMAGE ID SIZE
## docker.io/library/nginx 1.25 a6bd71f48f68 67.3MB
## registry.k8s.io/coredns/coredns v1.11.1 cbb01a7bd410 16.2MB
## registry.k8s.io/etcd 3.5.10 a0d6d4a97e3c 102MB
## registry.k8s.io/kube-apiserver v1.29.0 e7972205b661 128MB
## registry.k8s.io/kube-proxy v1.29.0 2d4e6e23bb50 82.5MB
## registry.k8s.io/pause 3.9 e6f181688397 744kB
## Show full image IDs (not truncated)
crictl images --no-trunc
## Show only image IDs (useful for scripting)
crictl images --quiet
## Show images with digests
crictl images --digests
## Output in JSON format
crictl images -o json
## Output in YAML format
crictl images -o yaml
## Filter images by repository name
crictl images nginx
## Filter images by full reference
crictl images docker.io/library/nginx:1.25
Pulling Images¶
## Pull an image from a registry
crictl pull nginx:latest
## Pull a specific version
crictl pull nginx:1.25.3
## Pull from a specific registry
crictl pull docker.io/library/alpine:3.19
## Pull from a private registry (requires authentication configured in containerd)
crictl pull myregistry.example.com/myapp:v1.0
## Pull and show progress
crictl pull --no-trunc busybox:latest
Inspecting Images¶
## Inspect a specific image to see its details
crictl inspecti nginx:latest
## Pretty-print the image inspection
crictl inspecti nginx:latest | jq .
## Get the image size
crictl inspecti nginx:latest | jq '.info.size'
## Get the image's environment variables
crictl inspecti nginx:latest | jq '.info.imageSpec.config.Env'
## Get the image's entrypoint and command
crictl inspecti nginx:latest | jq '{entrypoint: .info.imageSpec.config.Entrypoint, cmd: .info.imageSpec.config.Cmd}'
## Get the image's exposed ports
crictl inspecti nginx:latest | jq '.info.imageSpec.config.ExposedPorts'
## Get the image's labels
crictl inspecti nginx:latest | jq '.info.imageSpec.config.Labels'
Removing Images¶
## Remove a specific image by tag
crictl rmi nginx:latest
## Remove a specific image by image ID
crictl rmi a6bd71f48f68
## Remove multiple images at once
crictl rmi nginx:latest alpine:3.19 busybox:latest
## Remove all unused images (images not referenced by any container)
crictl rmi --prune
## Force remove all images (use with caution!)
## This will remove images even if containers are using them
crictl rmi --all
Image Filesystem Information¶
## Show image filesystem usage on the node
crictl imagefsinfo
## Pretty-print the output
crictl imagefsinfo | jq .
## This shows:
## - Timestamp of the information
## - Filesystem identifier (device, mountpoint)
## - Used bytes and inodes
## - Total bytes and inodes available
Removing Images in Production
Be very careful when removing images on production nodes.
If you remove an image that a running pod depends on, the pod may fail to restart
(e.g., during a node reboot or pod rescheduling).
Always use crictl rmi --prune instead of crictl rmi --all in production environments.
Step 10 - Stats and Resource Usage¶
crictlprovides commands to monitor resource usage of containers and pods in real-time.- This is useful for identifying resource-hungry containers and debugging performance issues.
Container Stats¶
## Show resource usage for all running containers
crictl stats
## Example output:
## CONTAINER CPU % MEM DISK INODES
## b1c2d3e4f5g6 0.15 25.6MB 12.4kB 15
## h7i8j9k0l1m2 0.08 18.2MB 8.2kB 12
## Show stats for a specific container
crictl stats b1c2d3e4f5g6
## Show stats for all containers (including stopped)
crictl stats -a
## Output stats in JSON format
crictl stats -o json
## Output stats in YAML format
crictl stats -o yaml
## Filter stats by label
crictl stats --label io.kubernetes.container.name=nginx
## Show stats by container ID
crictl stats --id b1c2d3e4f5g6
Pod Stats¶
## Show resource usage for all pods (aggregated)
crictl statsp
## Example output:
## POD CPU % MEM
## a1b2c3d4e5f6 0.23 43.8MB
## f6e5d4c3b2a1 0.08 18.2MB
## Show stats for a specific pod
crictl statsp --id a1b2c3d4e5f6
## Filter pod stats by label
crictl statsp --label app=nginx
## Output pod stats in JSON format
crictl statsp -o json
Analyzing Resource Usage¶
## Find the container using the most memory
## (using JSON output and jq for sorting)
crictl stats -o json | jq -r '.stats | sort_by(.memory.workingSetBytes.value) | reverse | .[] | "\(.attributes.id[0:12]) \(.attributes.labels["io.kubernetes.container.name"]) \(.memory.workingSetBytes.value / 1048576)MB"'
## Find the container using the most CPU
crictl stats -o json | jq -r '.stats | sort_by(.cpu.usageCoreNanoSeconds.value) | reverse | .[] | "\(.attributes.id[0:12]) \(.attributes.labels["io.kubernetes.container.name"]) CPU: \(.cpu.usageCoreNanoSeconds.value)"'
## Monitor stats continuously (watch mode)
## Use the 'watch' command to refresh stats every 2 seconds
watch -n 2 crictl stats
## Monitor a specific container continuously
watch -n 2 crictl stats --id b1c2d3e4f5g6
Comparing with kubectl top
crictl stats shows the raw resource usage from the container runtime, while
kubectl top pods shows metrics from the Kubernetes Metrics Server.
The values may differ slightly because they come from different sources and
measure different things. crictl stats is node-local and does not require
the Metrics Server to be installed.
Step 11 - Runtime Info and Debugging¶
crictlprovides commands to query the container runtime itself for version information, configuration, and health status.- These commands are essential for diagnosing runtime-level issues.
Runtime Version¶
## Show the CRI version and the container runtime version
crictl version
## Example output:
## Version: 0.1.0
## RuntimeName: containerd
## RuntimeVersion: v1.7.11
## RuntimeApiVersion: v1
Runtime Information¶
## Show detailed runtime information (configuration, features, status)
crictl info
## Pretty-print the runtime info
crictl info | jq .
## Get the runtime's storage driver
crictl info | jq '.config.containerd.snapshotter'
## Get the runtime's CNI configuration
crictl info | jq '.config.cni'
## Get the runtime's cgroup driver
crictl info | jq '.config.containerd.runtimes.runc.options.SystemdCgroup'
## Check if the runtime is ready
crictl info | jq '.status.conditions'
## Example output for conditions:
## [
## { "type": "RuntimeReady", "status": true, "reason": "", "message": "" },
## { "type": "NetworkReady", "status": true, "reason": "", "message": "" }
## ]
Runtime Status and Health Check¶
## Check the runtime status conditions
## This tells you if the runtime and network plugins are healthy
crictl info | jq '.status'
## Check specifically if RuntimeReady is true
crictl info | jq '.status.conditions[] | select(.type=="RuntimeReady")'
## Check specifically if NetworkReady is true
crictl info | jq '.status.conditions[] | select(.type=="NetworkReady")'
Debugging Common Runtime Issues¶
## Check if the runtime socket exists and is accessible
ls -la /run/containerd/containerd.sock
## Check if the containerd service is running
systemctl status containerd
## Check containerd logs for errors
journalctl -u containerd --since "10 minutes ago" --no-pager
## For CRI-O:
systemctl status crio
journalctl -u crio --since "10 minutes ago" --no-pager
## Check the kubelet logs for CRI-related errors
journalctl -u kubelet --since "10 minutes ago" --no-pager | grep -i "cri\|runtime\|container"
## Verify the CRI socket permissions
stat /run/containerd/containerd.sock
## Test connectivity to the runtime socket directly
crictl --runtime-endpoint unix:///run/containerd/containerd.sock version
RuntimeReady and NetworkReady
If crictl info shows RuntimeReady: false or NetworkReady: false, it means
the container runtime or the CNI plugin is not functioning correctly. This will prevent
new pods from starting on this node. Check the runtime logs and CNI configuration immediately.
Step 12 - Advanced Debugging: Creating Pods and Containers Manually¶
- For advanced debugging scenarios,
crictlallows you to manually create pod sandboxes and containers. - This is useful for testing runtime behavior, reproducing issues, and understanding the pod lifecycle.
Manual Pod/Container Creation
Pods and containers created manually with crictl are not managed by the kubelet.
The kubelet will not know about them, they will not appear in kubectl get pods, and
they will not be automatically restarted or garbage collected.
Use this only for debugging purposes and clean up afterward.
Create a Pod Sandbox¶
## First, create a pod sandbox configuration file
cat <<EOF > /tmp/pod-sandbox-config.json
{
"metadata": {
"name": "debug-sandbox",
"namespace": "default",
"attempt": 1,
"uid": "debug-sandbox-uid-001"
},
"log_directory": "/tmp/debug-sandbox-logs",
"linux": {}
}
EOF
## Create the log directory
mkdir -p /tmp/debug-sandbox-logs
## Create (run) the pod sandbox
## The 'runp' command creates and starts a new pod sandbox
SANDBOX_ID=$(crictl runp /tmp/pod-sandbox-config.json)
echo "Created sandbox: $SANDBOX_ID"
## Verify the sandbox was created
crictl pods --name debug-sandbox
## Inspect the sandbox
crictl inspectp $SANDBOX_ID | jq .
Create a Container Inside the Sandbox¶
## Create a container configuration file
cat <<EOF > /tmp/container-config.json
{
"metadata": {
"name": "debug-container"
},
"image": {
"image": "docker.io/library/busybox:latest"
},
"command": [
"/bin/sh", "-c", "echo 'Hello from crictl debug container!' && sleep 3600"
],
"log_path": "debug-container.log",
"linux": {}
}
EOF
## Make sure the image is available locally
crictl pull busybox:latest
## Create the container (does NOT start it yet)
## Arguments: container-config sandbox-config sandbox-id
CONTAINER_ID=$(crictl create $SANDBOX_ID /tmp/container-config.json /tmp/pod-sandbox-config.json)
echo "Created container: $CONTAINER_ID"
## Verify the container was created (state = Created)
crictl ps -a --id $CONTAINER_ID
## Start the container
crictl start $CONTAINER_ID
## Verify the container is now running
crictl ps --id $CONTAINER_ID
Interact with the Manual Container¶
## View the container logs
crictl logs $CONTAINER_ID
## Execute a command inside the container
crictl exec $CONTAINER_ID hostname
## Open an interactive shell
crictl exec -it $CONTAINER_ID /bin/sh
Clean Up Manual Pods and Containers¶
## Stop the container
crictl stop $CONTAINER_ID
## Remove the container
crictl rm $CONTAINER_ID
## Stop the pod sandbox
crictl stopp $SANDBOX_ID
## Remove the pod sandbox
crictl rmp $SANDBOX_ID
## Verify everything is cleaned up
crictl pods --name debug-sandbox
crictl ps -a --id $CONTAINER_ID
## Remove the temporary config files
rm -f /tmp/pod-sandbox-config.json /tmp/container-config.json
rm -rf /tmp/debug-sandbox-logs
Understanding the Manual Lifecycle¶
The manual pod/container lifecycle mirrors what the kubelet does automatically:
graph LR
A["runp<br/>(create sandbox)"] --> B["create<br/>(create container)"]
B --> C["start<br/>(start container)"]
C --> D["exec / logs<br/>(interact)"]
D --> E["stop<br/>(stop container)"]
E --> F["rm<br/>(remove container)"]
F --> G["stopp<br/>(stop sandbox)"]
G --> H["rmp<br/>(remove sandbox)"]
style A fill:#42a5f5,stroke:#1565c0,color:#000
style B fill:#66bb6a,stroke:#2e7d32,color:#000
style C fill:#66bb6a,stroke:#2e7d32,color:#000
style D fill:#f9a825,stroke:#f57f17,color:#000
style E fill:#ef5350,stroke:#c62828,color:#fff
style F fill:#ef5350,stroke:#c62828,color:#fff
style G fill:#ef5350,stroke:#c62828,color:#fff
style H fill:#ef5350,stroke:#c62828,color:#fff
kubelet Garbage Collection
In normal Kubernetes operation, the kubelet automatically garbage collects exited containers
and unused pod sandboxes. When you create pods and containers manually with crictl,
the kubelet may garbage collect them if they are not associated with a known pod
in the API server. Always clean up your manual resources promptly.
crictl Command Reference¶
Below is a quick-reference table of all major crictl commands:
| Command | Description |
|---|---|
crictl pods |
List pod sandboxes |
crictl inspectp |
Inspect a pod sandbox |
crictl runp |
Create and start a pod sandbox |
crictl stopp |
Stop a pod sandbox |
crictl rmp |
Remove a pod sandbox |
crictl ps |
List containers |
crictl inspect |
Inspect a container |
crictl create |
Create a container |
crictl start |
Start a container |
crictl stop |
Stop a container |
crictl rm |
Remove a container |
crictl exec |
Execute a command in a running container |
crictl logs |
View container logs |
crictl attach |
Attach to a running container |
crictl port-forward |
Forward local port to a pod sandbox |
crictl images |
List images |
crictl inspecti |
Inspect an image |
crictl pull |
Pull an image |
crictl rmi |
Remove an image |
crictl imagefsinfo |
Show image filesystem info |
crictl stats |
Show container resource usage |
crictl statsp |
Show pod resource usage |
crictl info |
Show runtime information |
crictl version |
Show CRI and runtime versions |
crictl completion |
Generate shell completion scripts |
Exercises¶
The following exercises will test your understanding of crictl concepts.
Try to solve each exercise on your own before revealing the solution.
01. List All Pods in a Specific Namespace¶
List all pod sandboxes running in the kube-system namespace using crictl.
Scenario:¶
β¦ You are debugging DNS issues and need to check the state of CoreDNS pods on a specific node.
β¦ You need to verify that all system-level pods are in the Ready state.
Hint: Use crictl pods with the --namespace filter.
Solution
## List all pods in the kube-system namespace
crictl pods --namespace kube-system
## List only Ready pods in kube-system
crictl pods --namespace kube-system --state Ready
## List pods in kube-system with specific labels (e.g., CoreDNS)
crictl pods --namespace kube-system --label k8s-app=kube-dns
## Show full output in JSON format for detailed inspection
crictl pods --namespace kube-system -o json | jq '.items[] | {name: .metadata.name, state: .state, id: .id}'
02. Find All Exited Containers and Get Their Exit Codes¶
Identify all containers that have exited and determine their exit codes.
Scenario:¶
β¦ A pod is in CrashLoopBackOff and you need to understand why containers keep failing.
β¦ Exit codes provide critical information about the nature of the failure.
Hint: Use crictl ps -a to show all containers, filter by state Exited, and then use crictl inspect to get the exit codes.
Solution
## List all exited containers
crictl ps -a --state Exited
## Get exit codes for all exited containers using JSON output and jq
crictl ps -a --state Exited -o json | jq '.containers[] | {
name: .metadata.name,
id: .id,
podSandboxId: .podSandboxId,
state: .state,
createdAt: .createdAt,
finishedAt: .finishedAt
}'
## Inspect a specific exited container for its exit code
## Replace <container-id> with the actual container ID
crictl inspect <container-id> | jq '{
name: .status.metadata.name,
exitCode: .status.exitCode,
reason: .status.reason,
message: .status.message,
finishedAt: .status.finishedAt
}'
## Quick one-liner: list all exited containers with exit codes
for CID in $(crictl ps -a --state Exited -q); do
NAME=$(crictl inspect $CID 2>/dev/null | jq -r '.status.metadata.name')
EXIT=$(crictl inspect $CID 2>/dev/null | jq -r '.status.exitCode')
echo "Container: $CID Name: $NAME ExitCode: $EXIT"
done
## Common exit codes reference:
## 0 = Success
## 1 = General error
## 137 = OOMKilled (SIGKILL)
## 139 = Segfault (SIGSEGV)
## 143 = Graceful termination (SIGTERM)
03. Get Container Logs from the Last 5 Minutes¶
Retrieve the logs from a specific container, showing only entries from the last 5 minutes, with timestamps.
Scenario:¶
β¦ An application started misbehaving 5 minutes ago, and you need to see only the recent log entries. β¦ Including timestamps helps correlate log events with external incidents.
Hint: Use crictl logs with --since and --timestamps flags. You will need to calculate the timestamp for 5 minutes ago.
Solution
## First, identify the container you want to inspect
crictl ps
## Note the container ID from the output
## Calculate the timestamp for 5 minutes ago and get logs
## On most Linux systems:
SINCE=$(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%SZ)
crictl logs --since "$SINCE" --timestamps <container-id>
## On systems without GNU date (e.g., Alpine / BusyBox):
## Use a specific timestamp instead
crictl logs --since "2024-01-15T10:25:00Z" --timestamps <container-id>
## Alternative: get the last 100 lines with timestamps
## (useful when you are not sure of the exact time range)
crictl logs --tail 100 --timestamps <container-id>
## Follow logs from now with timestamps (real-time monitoring)
crictl logs --follow --timestamps --tail 0 <container-id>
04. Execute a Command Inside a Running Container¶
Use crictl exec to check the environment variables and network configuration of a running container.
Scenario:¶
β¦ An application is failing to connect to a database, and you suspect the environment variables are misconfigured. β¦ You need to verify both the environment variables and the network connectivity from inside the container.
Hint: Use crictl exec with commands like env, cat /etc/resolv.conf, and wget or curl.
Solution
## First, identify the running container
crictl ps
## Check environment variables inside the container
crictl exec <container-id> env
## Filter for specific environment variables (e.g., database-related)
crictl exec <container-id> env | grep -i db
## Check DNS configuration
crictl exec <container-id> cat /etc/resolv.conf
## Check the container's IP address
crictl exec <container-id> ip addr show 2>/dev/null || \
crictl exec <container-id> cat /etc/hosts
## Test connectivity to a database service
crictl exec <container-id> wget -qO- --timeout=5 http://db-service:5432 2>&1 || \
echo "Connection test completed (non-zero exit may be expected for non-HTTP services)"
## Test DNS resolution
crictl exec <container-id> nslookup db-service.default.svc.cluster.local 2>/dev/null || \
crictl exec <container-id> cat /etc/hosts
## Check listening ports inside the container
crictl exec <container-id> netstat -tlnp 2>/dev/null || \
crictl exec <container-id> ss -tlnp 2>/dev/null || \
echo "Neither netstat nor ss available in container"
## Open an interactive shell for more thorough investigation
crictl exec -it <container-id> /bin/sh
05. Pull an Image and Verify It Is Available¶
Pull the alpine:3.19 image using crictl and verify it is available in the local container runtime image store.
Scenario:¶
β¦ You need to pre-pull images on a node before deploying pods to reduce startup time. β¦ You want to verify that a specific image version is available on the node.
Hint: Use crictl pull to download the image and crictl images to verify it is present.
Solution
## Check if the image is already available
crictl images | grep alpine
## Pull the specific image
crictl pull alpine:3.19
## Verify the image was pulled successfully
crictl images | grep alpine
## Get detailed information about the pulled image
crictl inspecti alpine:3.19 | jq '{
size: .info.size,
entrypoint: .info.imageSpec.config.Entrypoint,
cmd: .info.imageSpec.config.Cmd,
env: .info.imageSpec.config.Env
}'
## Verify by pulling with the full registry path
crictl pull docker.io/library/alpine:3.19
## List all alpine images with their digests
crictl images --digests | grep alpine
## Show image filesystem usage to see the impact of the new image
crictl imagefsinfo | jq '.status.usedBytes'
06. Find the Container Runtime Version and Socket Path¶
Determine the container runtime version, the CRI socket path, and whether the runtime is healthy.
Scenario:¶
β¦ You are troubleshooting node issues and need to verify that the container runtime is functioning correctly. β¦ You need to document the runtime version for compliance and compatibility purposes.
Hint: Use crictl version, crictl info, and check the configuration file at /etc/crictl.yaml.
Solution
## Get the CRI and runtime version
crictl version
## Sample output:
## Version: 0.1.0
## RuntimeName: containerd
## RuntimeVersion: v1.7.11
## RuntimeApiVersion: v1
## Check the configured socket path
cat /etc/crictl.yaml
## Get detailed runtime information
crictl info | jq '{
runtime: .status,
cniConfig: .config.cni,
cgroupDriver: .config.containerd.runtimes.runc.options.SystemdCgroup
}'
## Check the runtime health (conditions)
crictl info | jq '.status.conditions[] | {type, status}'
## Verify the socket file exists and check permissions
ls -la /run/containerd/containerd.sock
## Check the containerd (or CRI-O) service status
systemctl status containerd 2>/dev/null || systemctl status crio 2>/dev/null
## Get the runtime's build information
containerd --version 2>/dev/null || crio --version 2>/dev/null
07. Get Resource Usage Stats for All Running Pods¶
Display the CPU and memory usage for all running pods on this node.
Scenario:¶
β¦ You suspect a noisy-neighbor problem where one pod is consuming excessive resources. β¦ You need a quick overview of resource consumption on this specific node.
Hint: Use crictl statsp to get pod-level stats and crictl stats for container-level stats.
Solution
## Show resource usage for all pods
crictl statsp
## Show resource usage for all containers (more granular)
crictl stats
## Get pod stats in JSON format for analysis
crictl statsp -o json | jq '.stats[] | {
podId: .attributes.id[0:12],
podName: .attributes.labels["io.kubernetes.pod.name"],
namespace: .attributes.labels["io.kubernetes.pod.namespace"],
cpu: .cpu.usageCoreNanoSeconds.value,
memory: (.memory.workingSetBytes.value / 1048576)
}' | head -50
## Get container stats in JSON format for analysis
crictl stats -o json | jq '.stats[] | {
containerId: .attributes.id[0:12],
containerName: .attributes.labels["io.kubernetes.container.name"],
cpu: .cpu.usageCoreNanoSeconds.value,
memoryMB: (.memory.workingSetBytes.value / 1048576)
}'
## Monitor stats in real-time with watch
watch -n 5 crictl statsp
## Sort containers by memory usage (highest first)
crictl stats -o json | jq '[.stats[] | {
name: .attributes.labels["io.kubernetes.container.name"],
memoryMB: (.memory.workingSetBytes.value / 1048576)
}] | sort_by(.memoryMB) | reverse'
08. Inspect a Pod’s Network Configuration¶
Find the IP address, DNS settings, and network namespace of a specific pod using crictl.
Scenario:¶
β¦ A pod cannot communicate with other pods, and you need to verify its network configuration.
β¦ You need the network namespace path to run low-level debugging tools like tcpdump.
Hint: Use crictl inspectp with jq to extract the network information, and nsenter to inspect the network namespace.
Solution
## First, identify the pod sandbox
crictl pods --name <pod-name>
## Note the POD ID from the output
## Get the pod's IP address
crictl inspectp <pod-id> | jq '.status.network'
## Expected output:
## { "additionalIps": [], "ip": "10.244.0.15" }
## Get the pod's full network namespace information
crictl inspectp <pod-id> | jq '.info.runtimeSpec.linux.namespaces[] | select(.type=="network")'
## Get the network namespace path
NETNS=$(crictl inspectp <pod-id> | jq -r '.info.runtimeSpec.linux.namespaces[] | select(.type=="network") | .path')
echo "Network namespace: $NETNS"
## Enter the pod's network namespace and inspect network interfaces
sudo nsenter --net=$NETNS ip addr show
## Check routes in the pod's network namespace
sudo nsenter --net=$NETNS ip route show
## Check DNS resolution from the pod's perspective
sudo nsenter --net=$NETNS cat /etc/resolv.conf
## Run tcpdump in the pod's network namespace (for packet capture)
sudo nsenter --net=$NETNS tcpdump -i eth0 -c 10
## Check iptables rules in the pod's namespace
sudo nsenter --net=$NETNS iptables -L -n -v
## Test connectivity from the pod's namespace
sudo nsenter --net=$NETNS ping -c 3 <target-ip>
09. Find Which Container Is Using the Most Memory¶
Identify the container on this node that is consuming the most memory.
Scenario:¶
β¦ The node is under memory pressure and pods are being evicted. β¦ You need to quickly identify the biggest memory consumer to take corrective action.
Hint: Use crictl stats with JSON output and jq to sort by memory usage.
Solution
## Get stats in JSON and sort by memory (highest first)
crictl stats -o json | jq '[.stats[] | {
id: .attributes.id[0:12],
name: .attributes.labels["io.kubernetes.container.name"],
pod: .attributes.labels["io.kubernetes.pod.name"],
namespace: .attributes.labels["io.kubernetes.pod.namespace"],
memoryMB: ((.memory.workingSetBytes.value // 0) / 1048576 | floor)
}] | sort_by(.memoryMB) | reverse | .[0:10]'
## Quick one-liner to find the top memory consumer
crictl stats -o json | jq '[.stats[] | {
name: .attributes.labels["io.kubernetes.container.name"],
memoryMB: ((.memory.workingSetBytes.value // 0) / 1048576 | floor)
}] | sort_by(.memoryMB) | reverse | .[0]'
## After identifying the container, inspect its memory limits
crictl inspect <container-id> | jq '.info.runtimeSpec.linux.resources.memory'
## Check if the container has an OOMKilled history
## Look for exited containers with exit code 137
crictl ps -a --state Exited -o json | jq '.containers[] | select(.metadata.name == "<container-name>")' | head -20
## Get the PID of the top consumer and check its memory map on the host
PID=$(crictl inspect <container-id> | jq '.info.pid')
cat /proc/$PID/status | grep -i vmrss
10. Debug a CrashLoopBackOff Pod Using crictl¶
A pod is in CrashLoopBackOff. Use crictl to investigate the root cause by examining container exit codes and logs.
Scenario:¶
β¦ kubectl describe pod shows the pod is in CrashLoopBackOff, but the events do not give enough detail.
β¦ You need to inspect the actual container exit codes and logs from previous crash instances to understand the failure.
Hint: Use crictl ps -a to find all container instances (including exited ones), then use crictl inspect for exit codes and crictl logs for the crash output.
Solution
## Step 1: Find the pod sandbox for the crashing pod
crictl pods --name <pod-name>
## Note the POD ID
## Step 2: List ALL containers for this pod (including exited ones)
crictl ps -a --pod <pod-id>
## You should see multiple instances with increasing ATTEMPT numbers:
## CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
## aaa111bbb222 myapp 30s ago Running myapp 5 pod-id...
## ccc333ddd444 myapp 2m ago Exited myapp 4 pod-id...
## eee555fff666 myapp 4m ago Exited myapp 3 pod-id...
## Step 3: Inspect an exited container to find the exit code
crictl inspect ccc333ddd444 | jq '{
name: .status.metadata.name,
state: .status.state,
exitCode: .status.exitCode,
reason: .status.reason,
message: .status.message,
startedAt: .status.startedAt,
finishedAt: .status.finishedAt
}'
## Step 4: View the logs from the exited (crashed) container
crictl logs ccc333ddd444
## Step 5: View logs from the oldest crash instance for comparison
crictl logs eee555fff666
## Step 6: Check if it is an OOMKill (exit code 137)
EXIT_CODE=$(crictl inspect ccc333ddd444 | jq '.status.exitCode')
if [ "$EXIT_CODE" == "137" ]; then
echo "Container was OOMKilled!"
echo "Check memory limits:"
crictl inspect ccc333ddd444 | jq '.info.runtimeSpec.linux.resources.memory'
echo "Check current memory usage of the running instance:"
RUNNING_CID=$(crictl ps --pod <pod-id> -q)
crictl stats --id $RUNNING_CID
fi
## Step 7: Check if it is a command-not-found issue (exit code 127)
if [ "$EXIT_CODE" == "127" ]; then
echo "Command not found inside container!"
echo "Check the entrypoint/command:"
crictl inspect ccc333ddd444 | jq '.info.runtimeSpec.process.args'
fi
## Step 8: If it is a configuration issue, check env vars
crictl inspect ccc333ddd444 | jq '.info.runtimeSpec.process.env'
11. Clean Up Unused Images to Free Disk Space¶
Remove all container images that are not currently referenced by any container on the node.
Scenario:¶
β¦ The node’s disk is running low because old images from previous deployments are still cached. β¦ You need to safely clean up unused images without affecting running workloads.
Hint: Use crictl rmi --prune to remove images not referenced by any container. Use crictl imagefsinfo to check disk usage before and after.
Solution
## Step 1: Check current disk usage by images
crictl imagefsinfo | jq '{
usedBytes: .status.usedBytes.value,
usedMB: (.status.usedBytes.value / 1048576 | floor),
totalBytes: .status.fsId
}'
## Step 2: List all images to see what is on the node
crictl images
## Step 3: Count total images
echo "Total images: $(crictl images -q | wc -l)"
## Step 4: Count running containers (to understand which images are in use)
echo "Running containers: $(crictl ps -q | wc -l)"
## Step 5: Prune unused images (safe - only removes unreferenced images)
crictl rmi --prune
## Step 6: Verify the cleanup
crictl images
echo "Remaining images: $(crictl images -q | wc -l)"
## Step 7: Check disk usage after cleanup
crictl imagefsinfo | jq '{
usedBytes: .status.usedBytes.value,
usedMB: (.status.usedBytes.value / 1048576 | floor)
}'
## Note: If you need to remove a specific old image
crictl rmi <image-name>:<tag>
## To see which images are in use by running containers:
for CID in $(crictl ps -q); do
crictl inspect $CID 2>/dev/null | jq -r '.status.image.image'
done | sort -u
12. Create a Pod Sandbox and Container Manually with crictl (Advanced)¶
Create a pod sandbox and run a container inside it manually using crictl. This simulates what the kubelet does when it creates a pod.
Scenario:¶
β¦ You want to understand the pod lifecycle at the runtime level. β¦ You need to test whether the container runtime can create and run containers independently of the kubelet (to isolate issues).
Hint: Create JSON config files for the pod sandbox and container, then use crictl runp, crictl create, and crictl start.
Solution
## Step 1: Create the pod sandbox configuration
cat <<EOF > /tmp/test-sandbox.json
{
"metadata": {
"name": "test-pod",
"namespace": "default",
"attempt": 1,
"uid": "test-pod-uid-$(date +%s)"
},
"log_directory": "/tmp/test-pod-logs",
"linux": {}
}
EOF
## Create the log directory
mkdir -p /tmp/test-pod-logs
## Step 2: Create the pod sandbox
SANDBOX_ID=$(crictl runp /tmp/test-sandbox.json)
echo "Sandbox ID: $SANDBOX_ID"
## Step 3: Verify the sandbox is running
crictl pods --id $SANDBOX_ID
## Step 4: Create the container configuration
cat <<EOF > /tmp/test-container.json
{
"metadata": {
"name": "test-nginx"
},
"image": {
"image": "docker.io/library/nginx:alpine"
},
"log_path": "test-nginx.log",
"linux": {}
}
EOF
## Step 5: Pull the image if not already available
crictl pull nginx:alpine
## Step 6: Create the container (returns container ID)
CONTAINER_ID=$(crictl create $SANDBOX_ID /tmp/test-container.json /tmp/test-sandbox.json)
echo "Container ID: $CONTAINER_ID"
## Step 7: Start the container
crictl start $CONTAINER_ID
## Step 8: Verify the container is running
crictl ps --id $CONTAINER_ID
## Step 9: Test the container
crictl exec $CONTAINER_ID nginx -v
crictl exec $CONTAINER_ID curl -s http://localhost:80 2>/dev/null || \
crictl exec $CONTAINER_ID wget -qO- http://localhost:80
## Step 10: View container logs
crictl logs $CONTAINER_ID
## Step 11: Inspect the container
crictl inspect $CONTAINER_ID | jq '{
state: .status.state,
image: .status.image.image,
pid: .info.pid,
startedAt: .status.startedAt
}'
## Step 12: Clean up everything
crictl stop $CONTAINER_ID
crictl rm $CONTAINER_ID
crictl stopp $SANDBOX_ID
crictl rmp $SANDBOX_ID
## Verify cleanup
crictl pods --id $SANDBOX_ID
crictl ps -a --id $CONTAINER_ID
## Remove temp files
rm -f /tmp/test-sandbox.json /tmp/test-container.json
rm -rf /tmp/test-pod-logs
Finalize & Cleanup¶
- If you created any manual pods or containers during this lab, clean them up:
## List all manually created sandboxes (ones not managed by kubelet)
## These typically have simple names like "debug-sandbox" or "test-pod"
crictl pods
## Stop and remove any manual sandboxes
## Replace <sandbox-id> with the actual ID
crictl stopp <sandbox-id>
crictl rmp <sandbox-id>
## Clean up any stopped containers that you created
crictl ps -a --state Exited
## Remove stopped containers by ID
crictl rm <container-id>
- If you pulled test images that you no longer need:
## Remove specific test images
crictl rmi busybox:latest
crictl rmi alpine:3.19
crictl rmi nginx:alpine
## Or prune all unused images
crictl rmi --prune
- If you modified the
crictlconfiguration (e.g., enabled debug mode), restore it:
## Restore the standard crictl configuration
sudo tee /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
- Remove any temporary files created during the lab:
## Clean up temporary config files
rm -f /tmp/pod-sandbox-config.json
rm -f /tmp/container-config.json
rm -f /tmp/test-sandbox.json
rm -f /tmp/test-container.json
rm -rf /tmp/debug-sandbox-logs
rm -rf /tmp/test-pod-logs
Troubleshooting¶
- “crictl: command not found”:
Ensure crictl is installed and in your PATH. Check the installation step or verify the binary location:
## Check if crictl exists anywhere on the system
find / -name "crictl" -type f 2>/dev/null
## If found but not in PATH, add it
export PATH=$PATH:/usr/local/bin
- “failed to connect: connection error: … /run/containerd/containerd.sock: no such file or directory”:
The container runtime socket does not exist at the configured path. Check which runtime is installed and update the configuration:
## Check which sockets exist on the system
ls -la /run/containerd/containerd.sock 2>/dev/null
ls -la /var/run/crio/crio.sock 2>/dev/null
ls -la /run/cri-dockerd.sock 2>/dev/null
## Check if containerd is running
systemctl status containerd
## Update /etc/crictl.yaml with the correct socket path
- “permission denied” when running crictl commands:
crictl needs access to the container runtime socket, which typically requires root privileges:
## Run with sudo
sudo crictl ps
## Or check socket permissions
ls -la /run/containerd/containerd.sock
## The socket should be owned by root:root with permissions srw-rw----
## You can add your user to the appropriate group if needed
- “crictl pods” returns empty output but pods are running (kubectl shows pods):
This usually means crictl is connected to the wrong runtime socket:
## Check which runtime the kubelet is using
ps aux | grep kubelet | grep container-runtime-endpoint
## Verify your crictl configuration matches
cat /etc/crictl.yaml
## Try specifying the endpoint explicitly
crictl --runtime-endpoint unix:///run/containerd/containerd.sock pods
- “context deadline exceeded” errors:
The runtime is not responding within the configured timeout. This could indicate the runtime is overloaded or having issues:
## Increase the timeout in /etc/crictl.yaml
sudo sed -i 's/timeout: 10/timeout: 30/' /etc/crictl.yaml
## Or pass a longer timeout on the command line
crictl --timeout 30 ps
## Check the runtime's health
systemctl status containerd
journalctl -u containerd --since "5 minutes ago" --no-pager
- “jq: command not found” when trying to parse JSON output:
Install jq to parse JSON output from crictl:
## Debian/Ubuntu
sudo apt-get install -y jq
## RHEL/CentOS/Fedora
sudo yum install -y jq
## Alpine
sudo apk add jq
- “crictl info” shows
RuntimeReady: false:
The container runtime is not healthy. Check the runtime service and its logs:
## Restart the container runtime
sudo systemctl restart containerd
## Check the runtime logs for errors
journalctl -u containerd --since "10 minutes ago" --no-pager
## Verify the runtime recovers
crictl info | jq '.status.conditions'
- “crictl info” shows
NetworkReady: false:
The CNI plugin is not functioning correctly. Check the CNI configuration:
## Check CNI configuration files
ls -la /etc/cni/net.d/
## Check CNI binary directory
ls -la /opt/cni/bin/
## Check kubelet logs for CNI errors
journalctl -u kubelet --since "10 minutes ago" --no-pager | grep -i cni
## Restart the kubelet to re-initialize networking
sudo systemctl restart kubelet
Next Steps¶
- Explore
nerdctlfor a Docker-compatible CLI experience withcontainerd: nerdctl on GitHub. - Learn about
ctr, the low-levelcontainerdCLI tool, for even deeper runtime debugging. - Study the CRI specification to understand the full gRPC API that
crictluses: CRI API. - Practice using
nsentertogether withcrictlfor advanced network debugging inside pod sandboxes. - Explore the
kubectl debug node/<node-name>command as an alternative way to get a shell on a node without SSH. - Learn about Kubernetes node-level troubleshooting: Kubernetes Node Debugging Guide.
- Set up shell completion for
crictlto speed up your debugging workflow:crictl completion bash > /etc/bash_completion.d/crictl. - Explore the
RuntimeClassfeature to understand how Kubernetes supports multiple container runtimes on the same node.
kubectl Deep Dive¶
- Welcome to the
kubectldeep-dive hands-on lab! This is not a beginner tutorial - it is a comprehensive, in-depth exploration of everythingkubectlcan do. - You will master advanced output formatting, JSONPath expressions, resource patching strategies, interactive debugging, RBAC inspection, raw API access, plugin management, and performance optimization techniques.
- By the end of this lab,
kubectlwill feel like a natural extension of your hands when working with any Kubernetes cluster.
What will we learn?¶
- How
kubectlcommunicates with the Kubernetes API server (the full request lifecycle) - kubeconfig file structure: clusters, users, contexts, and merging multiple configs
- The Kubernetes API resource model (Group, Version, Resource)
- All output formats: JSON, YAML, wide, name, JSONPath, custom-columns, go-template
- Advanced JSONPath expressions for filtering, sorting, and extracting data
- Field selectors, label selectors, and advanced
getoperations - Resource inspection with
describe,explain,api-resources, andapi-versions - Declarative vs. imperative resource management (
applyvs.createvs.replace) - Server-side apply, dry-run modes, and
kubectl diff - All three patching strategies: strategic merge, JSON merge, and JSON patch
- Interactive debugging:
exec,cp,port-forward,attach, anddebug - Operational commands:
wait,rollout, andautoscale - RBAC inspection with
auth can-i,auth whoami, andauth reconcile - Extending kubectl with plugins and krew
- Raw API access via
kubectl proxyand token-based requests - Performance tips: watch mode, bash completion, aliases, and resource caching
Official Documentation & References¶
Introduction¶
How kubectl Works¶
kubectlis the command-line tool for interacting with Kubernetes clusters.- It does not communicate directly with nodes, pods, or containers. Instead, every single
kubectlcommand translates into one or more HTTP requests to the Kubernetes API server. - The flow is always the same:
kubectlreads your kubeconfig file to determine which cluster to talk to, authenticates with the API server, sends the request, and displays the response.
The full request lifecycle looks like this:
- Read kubeconfig -
kubectllocates the kubeconfig file (default:~/.kube/configor$KUBECONFIG) and reads the current context. - Resolve cluster and credentials - From the current context,
kubectldetermines the API server URL, the client certificate or token, and the CA bundle. - Build HTTP request - The
kubectlcommand is translated into an HTTP verb (GET,POST,PUT,PATCH,DELETE) against a REST endpoint (e.g.,/api/v1/namespaces/default/pods). - TLS handshake and authentication -
kubectlestablishes a TLS connection to the API server and presents its credentials. - API server processing - The API server authenticates the request, checks authorization (RBAC), runs admission controllers, and reads from or writes to etcd.
- Response - The API server returns a JSON response, which
kubectlformats according to your output flags and displays.
flowchart LR
A["kubectl<br/>(CLI)"] -->|"1. Read config"| B["kubeconfig<br/>(~/.kube/config)"]
B -->|"2. Cluster + Creds"| A
A -->|"3. HTTPS REST call"| C["kube-apiserver<br/>(Control Plane)"]
C -->|"4. AuthN / AuthZ /<br/>Admission"| C
C -->|"5. Read / Write"| D["etcd<br/>(Data Store)"]
D -->|"6. Response data"| C
C -->|"7. JSON response"| A
A -->|"8. Formatted output"| E["Terminal<br/>(stdout)"]
style A fill:#326CE5,stroke:#fff,color:#fff
style B fill:#F5A623,stroke:#fff,color:#fff
style C fill:#326CE5,stroke:#fff,color:#fff
style D fill:#4DB33D,stroke:#fff,color:#fff
style E fill:#666,stroke:#fff,color:#fff
Every kubectl command is just an API call
Understanding this is the single most important insight for mastering kubectl. When you run kubectl get pods, you are sending GET /api/v1/namespaces/default/pods to the API server. When you run kubectl apply -f deployment.yaml, you are sending a PATCH or POST request. This mental model will help you debug every issue you encounter.
kubeconfig File Structure¶
The kubeconfig file is a YAML file with three main sections: clusters, users, and contexts. A context binds a cluster to a user (and optionally a namespace).
apiVersion: v1
kind: Config
## The currently active context
current-context: my-cluster-context
## Cluster definitions - where to connect
clusters:
- name: production-cluster
cluster:
## The API server URL
server: https://k8s-prod.example.com:6443
## CA certificate to verify the API server's TLS cert
certificate-authority-data: LS0tLS1CRUdJTi...base64...
- name: staging-cluster
cluster:
server: https://k8s-staging.example.com:6443
certificate-authority-data: LS0tLS1CRUdJTi...base64...
## User credentials - how to authenticate
users:
- name: admin-user
user:
## Client certificate authentication
client-certificate-data: LS0tLS1CRUdJTi...base64...
client-key-data: LS0tLS1CRUdJTi...base64...
- name: dev-user
user:
## Token-based authentication
token: eyJhbGciOiJSUzI1Ni...
## Contexts bind a cluster + user + optional namespace
contexts:
- name: my-cluster-context
context:
cluster: production-cluster
user: admin-user
namespace: default
- name: staging-context
context:
cluster: staging-cluster
user: dev-user
namespace: staging
kubeconfig supports multiple authentication methods
Besides client certificates and tokens, kubeconfig supports: exec-based credential plugins (e.g., aws-iam-authenticator, gke-gcloud-auth-plugin), OIDC tokens, username/password (deprecated), and auth provider plugins.
API Resource Model¶
Every Kubernetes resource belongs to an API Group, has a Version, and is identified by its Resource type (GVR). Understanding GVR is essential for advanced kubectl usage.
| Component | Description | Examples |
|---|---|---|
| Group | A logical collection of related resources | "" (core), apps, batch, networking.k8s.io |
| Version | The API version within a group | v1, v1beta1, v2 |
| Resource | The actual resource type | pods, deployments, services, ingresses |
The REST path for a resource follows this pattern:
- Core group:
/api/v1/namespaces/{ns}/{resource} - Named group:
/apis/{group}/{version}/namespaces/{ns}/{resource}
For example:
- Pods:
/api/v1/namespaces/default/pods(core group, so just/api/v1) - Deployments:
/apis/apps/v1/namespaces/default/deployments(apps group) - Ingresses:
/apis/networking.k8s.io/v1/namespaces/default/ingresses
Verbosity Levels¶
kubectl supports verbosity levels from -v=0 to -v=9. These are invaluable for debugging:
| Level | What it shows |
|---|---|
-v=0 |
Default output only |
-v=1 |
Adds the HTTP method used (GET, POST, etc.) |
-v=2 |
Adds timing information for API calls |
-v=3 |
Adds extended information about changes |
-v=4 |
Adds debug-level verbosity |
-v=5 |
Adds trace-level verbosity |
-v=6 |
Shows the full HTTP request URL |
-v=7 |
Shows HTTP request headers |
-v=8 |
Shows the HTTP request body |
-v=9 |
Shows the full HTTP response body (untruncated) - everything the API server returns |
Use -v=6 or higher to debug API calls
When something is not working as expected, -v=6 is the sweet spot. It shows you the exact URL being called without flooding you with body content. Use -v=9 only when you need to see the full response payload.
## See the exact API URL being called
kubectl get pods -v=6
## See full request and response headers
kubectl get pods -v=7
## See the complete request body (useful for apply/patch debugging)
kubectl apply -f deployment.yaml -v=8
## See the complete response body (the raw JSON from the API server)
kubectl get pods -v=9
Prerequisites¶
- A running Kubernetes cluster (minikube, kind, Docker Desktop, or a remote cluster)
kubectlinstalled and configured (version 1.25 or higher recommended)- Basic familiarity with Kubernetes concepts (pods, deployments, services, namespaces)
- Terminal access (bash or zsh)
Verify your environment¶
## Check kubectl version (client and server)
kubectl version
## Check cluster connectivity
kubectl cluster-info
## Check that you have at least one node ready
kubectl get nodes
Lab¶
Step 01 - kubeconfig Mastery¶
- In this step you will learn to manage multiple cluster configurations, switch contexts, merge kubeconfig files, and use the
KUBECONFIGenvironment variable.
View your current kubeconfig¶
## Display the full kubeconfig (with sensitive data redacted)
kubectl config view
## Display the full kubeconfig with secrets visible
kubectl config view --raw
## Show only the current context name
kubectl config current-context
Work with contexts¶
## List all available contexts
kubectl config get-contexts
## Switch to a different context
kubectl config use-context <context-name>
## Show details about a specific context
kubectl config get-contexts <context-name>
## Set a default namespace for the current context
## This avoids having to pass -n <namespace> on every command
kubectl config set-context --current --namespace=kubectl-lab
Create and manage contexts manually¶
## Add a new cluster entry
kubectl config set-cluster my-new-cluster \
--server=https://k8s.example.com:6443 \
--certificate-authority=/path/to/ca.crt
## Add a new user entry with token authentication
kubectl config set-credentials my-user \
--token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
## Add a new user entry with client certificate authentication
kubectl config set-credentials cert-user \
--client-certificate=/path/to/client.crt \
--client-key=/path/to/client.key
## Create a new context that binds cluster + user + namespace
kubectl config set-context my-context \
--cluster=my-new-cluster \
--user=my-user \
--namespace=default
## Delete a context
kubectl config delete-context my-context
## Delete a cluster entry
kubectl config delete-cluster my-new-cluster
## Delete a user entry
kubectl config delete-user my-user
Merge multiple kubeconfig files¶
## The KUBECONFIG environment variable accepts a colon-separated list of files
## kubectl merges them at runtime (the first file is the default for writes)
export KUBECONFIG=~/.kube/config:~/.kube/cluster-2.config:~/.kube/cluster-3.config
## Verify that contexts from all files are visible
kubectl config get-contexts
## Permanently merge multiple files into one
## This creates a single flat file with all clusters, users, and contexts
KUBECONFIG=~/.kube/config:~/.kube/cluster-2.config kubectl config view \
--flatten > ~/.kube/merged-config
## Use the merged config
export KUBECONFIG=~/.kube/merged-config
Use KUBECONFIG per terminal session
You can set KUBECONFIG per shell session to isolate cluster access. This is safer than having all clusters in one file, because you cannot accidentally run commands against the wrong cluster.
One-off context override without switching¶
## Run a command against a different context without switching
kubectl get pods --context=staging-context
## Run a command in a specific namespace without changing default
kubectl get pods --namespace=kube-system
## Combine both overrides
kubectl get pods --context=production-context --namespace=monitoring
Step 02 - Output Formatting Mastery¶
kubectlsupports many output formats. Mastering them transforms you from someone who reads terminal walls to someone who extracts exactly the data they need.
Set up the lab namespace and resources¶
## Create the lab namespace
kubectl apply -f manifests/namespace.yaml
## Deploy sample resources
kubectl apply -f manifests/sample-deployment.yaml
## Wait for the deployment to be ready
kubectl wait --for=condition=available deployment/nginx-lab \
-n kubectl-lab --timeout=120s
All output formats¶
## Default table output (human-readable)
kubectl get pods -n kubectl-lab
## Wide output - shows additional columns (node name, IP, etc.)
kubectl get pods -n kubectl-lab -o wide
## YAML output - the full resource definition as YAML
kubectl get pods -n kubectl-lab -o yaml
## JSON output - the full resource definition as JSON
kubectl get pods -n kubectl-lab -o json
## Name-only output - just the resource type/name (great for scripting)
kubectl get pods -n kubectl-lab -o name
## JSONPath output - extract specific fields using JSONPath expressions
kubectl get pods -n kubectl-lab \
-o jsonpath='{.items[*].metadata.name}'
## Custom columns - define your own tabular output
kubectl get pods -n kubectl-lab \
-o custom-columns='NAME:.metadata.name,STATUS:.status.phase,IP:.status.podIP'
## Go template - use Go template syntax for complex formatting
kubectl get pods -n kubectl-lab \
-o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}'
YAML and JSON output details¶
## Get a single pod as YAML (shows the complete spec including defaults)
kubectl get pod -n kubectl-lab -l app=nginx-lab -o yaml | head -80
## Get a single pod as JSON and pipe to jq for pretty formatting
kubectl get pod -n kubectl-lab -l app=nginx-lab -o json | jq '.items[0].metadata'
## Get just the spec section of a deployment
kubectl get deployment nginx-lab -n kubectl-lab -o json | jq '.spec'
## Get the status section of all pods
kubectl get pods -n kubectl-lab -o json | jq '.items[].status.phase'
Name output for scripting¶
## Get just pod names (useful for loops)
kubectl get pods -n kubectl-lab -o name
## Output: pod/nginx-lab-xxxx-yyyy
## Use in a loop to describe each pod
for pod in $(kubectl get pods -n kubectl-lab -o name); do
echo "=== $pod ==="
kubectl describe "$pod" -n kubectl-lab | head -20
done
## Delete all pods matching a label (using -o name)
## (Dry run - remove --dry-run=client to actually delete)
kubectl get pods -n kubectl-lab -l app=nginx-lab -o name | \
xargs kubectl delete --dry-run=client
Step 03 - JSONPath Deep Dive¶
- JSONPath is a query language for JSON. kubectl’s JSONPath implementation lets you extract, filter, and format data from API responses with surgical precision.
JSONPath syntax reference¶
| Expression | Meaning |
|---|---|
$ |
The root object (implicit in kubectl, can be omitted) |
.field |
Child field access |
[n] |
Array index (0-based) |
[*] |
All elements of an array |
[start:end] |
Array slice |
[?(@.field==x)] |
Filter expression - select elements where condition is true |
..field |
Recursive descent - find field at any depth |
{"\n"} |
Newline character (for formatting) |
{"\t"} |
Tab character (for formatting) |
{range}{end} |
Iterate over array elements |
Basic JSONPath expressions¶
## Get all pod names as a space-separated list
kubectl get pods -n kubectl-lab \
-o jsonpath='{.items[*].metadata.name}'
## Get the first pod's name
kubectl get pods -n kubectl-lab \
-o jsonpath='{.items[0].metadata.name}'
## Get all pod IPs
kubectl get pods -n kubectl-lab \
-o jsonpath='{.items[*].status.podIP}'
## Get pod names with newlines between them
kubectl get pods -n kubectl-lab \
-o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
Formatted multi-field output¶
## Get pod name and status on each line
kubectl get pods -n kubectl-lab \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
## Get pod name, node name, and pod IP (tab-separated)
kubectl get pods -n kubectl-lab \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\t"}{.status.podIP}{"\n"}{end}'
## Get container images for all pods
kubectl get pods -n kubectl-lab \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].image}{"\n"}{end}'
Filter expressions¶
## Get pods that are in Running phase
kubectl get pods -n kubectl-lab \
-o jsonpath='{.items[?(@.status.phase=="Running")].metadata.name}'
## Get pods scheduled on a specific node (replace NODE_NAME with an actual node)
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl get pods -n kubectl-lab \
-o jsonpath="{.items[?(@.spec.nodeName==\"${NODE_NAME}\")].metadata.name}"
## Get containers with memory limit of 128Mi
kubectl get pods -n kubectl-lab \
-o jsonpath='{.items[*].spec.containers[?(@.resources.limits.memory=="128Mi")].name}'
Nested and recursive descent¶
## Find all container names across all pods (recursive descent)
kubectl get pods -n kubectl-lab \
-o jsonpath='{..containers[*].name}'
## Find all image names using recursive descent
kubectl get pods -n kubectl-lab \
-o jsonpath='{..image}'
## Get all label values for the "app" key
kubectl get pods -n kubectl-lab \
-o jsonpath='{.items[*].metadata.labels.app}'
Real-world JSONPath examples¶
## Get all node names and their kernel versions
kubectl get nodes \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'
## Get all namespaces and their status
kubectl get namespaces \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
## Get all PVCs with their capacity and storage class
kubectl get pvc --all-namespaces \
-o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{.spec.resources.requests.storage}{"\t"}{.spec.storageClassName}{"\n"}{end}'
## Extract all unique container images running in the cluster
kubectl get pods --all-namespaces \
-o jsonpath='{.items[*].spec.containers[*].image}' | tr ' ' '\n' | sort -u
Step 04 - Custom Columns and Advanced Get Operations¶
- Custom columns give you full control over tabular output. Combined with field selectors, label selectors, and sort operations,
getbecomes an incredibly powerful query tool.
Custom columns¶
## Basic custom columns - pod name and status
kubectl get pods -n kubectl-lab \
-o custom-columns='NAME:.metadata.name,STATUS:.status.phase'
## Extended custom columns - name, node, IP, status, restarts
kubectl get pods -n kubectl-lab \
-o custom-columns='\
NAME:.metadata.name,\
NODE:.spec.nodeName,\
IP:.status.podIP,\
STATUS:.status.phase,\
RESTARTS:.status.containerStatuses[0].restartCount'
## Custom columns for deployments - name, replicas, available, image
kubectl get deployments -n kubectl-lab \
-o custom-columns='\
NAME:.metadata.name,\
DESIRED:.spec.replicas,\
AVAILABLE:.status.availableReplicas,\
IMAGE:.spec.template.spec.containers[0].image'
## Custom columns from a file (for reuse)
## Create a columns file:
## NAME NAMESPACE NODE STATUS
## .metadata.name .metadata.namespace .spec.nodeName .status.phase
## Then use it:
## kubectl get pods --all-namespaces -o custom-columns-file=columns.txt
Label selectors (-l / –selector)¶
## Select pods with a specific label
kubectl get pods -n kubectl-lab -l app=nginx-lab
## Select pods with multiple label requirements (AND logic)
kubectl get pods -n kubectl-lab -l app=nginx-lab,tier=frontend
## Select pods where a label exists (any value)
kubectl get pods -n kubectl-lab -l 'tier'
## Select pods where a label does NOT exist
kubectl get pods -n kubectl-lab -l '!tier'
## Select pods with set-based requirements
kubectl get pods -n kubectl-lab -l 'tier in (frontend, backend)'
kubectl get pods -n kubectl-lab -l 'environment notin (production)'
kubectl get pods -n kubectl-lab -l 'version in (v1, v2),tier=frontend'
## Show labels as columns in the output
kubectl get pods -n kubectl-lab --show-labels
## Show specific labels as extra columns
kubectl get pods -n kubectl-lab -L app,tier,version
Field selectors (–field-selector)¶
## Get pods by their status phase
kubectl get pods -n kubectl-lab --field-selector status.phase=Running
## Get pods NOT in Running state (find problematic pods)
kubectl get pods --all-namespaces --field-selector status.phase!=Running
## Get pods on a specific node
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl get pods --all-namespaces --field-selector "spec.nodeName=${NODE_NAME}"
## Get a specific pod by name using field selector
kubectl get pods -n kubectl-lab --field-selector metadata.name=nginx-lab
## Combine field selectors (AND logic)
kubectl get pods --all-namespaces \
--field-selector 'status.phase=Running,metadata.namespace!=kube-system'
## Get events for a specific namespace
kubectl get events -n kubectl-lab --field-selector type=Warning
Field selectors are limited
Not all fields support field selectors. The supported fields vary by resource type. Common supported fields are metadata.name, metadata.namespace, spec.nodeName, and status.phase for pods. Use kubectl get pods --field-selector help to see which fields are supported (this will produce an error message listing valid fields).
Sorting¶
## Sort pods by creation timestamp (oldest first)
kubectl get pods -n kubectl-lab --sort-by='.metadata.creationTimestamp'
## Sort pods by restart count (ascending)
kubectl get pods -n kubectl-lab \
--sort-by='.status.containerStatuses[0].restartCount'
## Sort nodes by capacity CPU
kubectl get nodes --sort-by='.status.capacity.cpu'
## Sort events by last timestamp
kubectl get events -n kubectl-lab --sort-by='.lastTimestamp'
## Sort namespaces by name
kubectl get namespaces --sort-by='.metadata.name'
All-namespaces flag¶
## List pods across ALL namespaces
kubectl get pods --all-namespaces
kubectl get pods -A ## shorthand
## Combine with selectors and output formatting
kubectl get pods -A -l tier=frontend -o wide
## Get all services across all namespaces with custom columns
kubectl get svc -A \
-o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,TYPE:.spec.type,CLUSTER-IP:.spec.clusterIP'
Step 05 - Resource Inspection¶
- Beyond
get, kubectl provides powerful commands for understanding resource schemas, discovering API capabilities, and deep-inspecting running resources.
kubectl describe¶
## Describe a pod - shows events, conditions, container status, volumes, etc.
kubectl describe pod -l app=nginx-lab -n kubectl-lab
## Describe a deployment - shows rollout status, replica sets, conditions
kubectl describe deployment nginx-lab -n kubectl-lab
## Describe a node - shows capacity, allocatable, conditions, running pods
kubectl describe node $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
## Describe a namespace
kubectl describe namespace kubectl-lab
## Describe a service
kubectl describe service nginx-lab -n kubectl-lab
describe vs get -o yaml
describe gives you a human-readable summary with events and computed status. get -o yaml gives you the raw API object, which is useful for programmatic access and for understanding exactly what is stored in etcd. Use describe for debugging and get -o yaml for full detail.
kubectl explain¶
## Explain a resource type (top-level fields)
kubectl explain pod
## Explain a specific field
kubectl explain pod.spec
## Explain a deeply nested field
kubectl explain pod.spec.containers
## Explain with full recursive output (shows ALL fields at all levels)
kubectl explain pod.spec.containers --recursive
## Explain deployment spec
kubectl explain deployment.spec
## Explain deployment strategy (shows the rollingUpdate fields)
kubectl explain deployment.spec.strategy
## Explain a CRD resource (if installed)
## kubectl explain myresource.spec
## Specify an API version if multiple versions exist
kubectl explain deployment --api-version=apps/v1
kubectl api-resources¶
## List ALL available resource types in the cluster
kubectl api-resources
## Show only namespaced resources
kubectl api-resources --namespaced=true
## Show only cluster-scoped resources
kubectl api-resources --namespaced=false
## Show resources in a specific API group
kubectl api-resources --api-group=apps
kubectl api-resources --api-group=batch
kubectl api-resources --api-group=networking.k8s.io
kubectl api-resources --api-group=rbac.authorization.k8s.io
## Show resources that support a specific verb
kubectl api-resources --verbs=list
kubectl api-resources --verbs=create,delete
kubectl api-resources --verbs=watch
## Show output with more detail (including short names and API group)
kubectl api-resources -o wide
## Find a resource by short name
kubectl api-resources | grep -i deploy
## Shows: deployments deploy apps/v1 true Deployment
kubectl api-versions¶
## List all available API versions
kubectl api-versions
## Filter for a specific group
kubectl api-versions | grep apps
kubectl api-versions | grep networking
kubectl api-versions | grep batch
Comparing resources¶
## Get the full YAML of a running resource (useful for comparison)
kubectl get deployment nginx-lab -n kubectl-lab -o yaml > running-deployment.yaml
## Compare a local manifest with what is running in the cluster
## (This shows what would change if you applied the local file)
kubectl diff -f manifests/sample-deployment.yaml
## Clean up the temporary file
rm -f running-deployment.yaml
Step 06 - apply vs create vs replace¶
- Understanding when to use
apply,create, orreplaceis critical for managing resources safely and predictably.
The three approaches¶
| Command | Style | Behavior | When to use |
|---|---|---|---|
kubectl apply |
Declarative | Creates or updates resources by merging with existing state | Day-to-day GitOps and config management |
kubectl create |
Imperative | Creates a resource. Fails if the resource already exists | One-time creation, quick prototyping |
kubectl replace |
Imperative | Replaces the entire resource. Fails if it does not exist | Full replacement (all unspecified fields are removed) |
kubectl create (imperative)¶
## Create a namespace imperatively
kubectl create namespace test-imperative
## Create a deployment imperatively (no manifest file needed)
kubectl create deployment nginx-test \
--image=nginx:1.25-alpine \
--replicas=2 \
-n test-imperative
## Create a service imperatively
kubectl create service clusterip nginx-test \
--tcp=80:80 \
-n test-imperative
## Create a configmap from literal values
kubectl create configmap app-config \
--from-literal=key1=value1 \
--from-literal=key2=value2 \
-n test-imperative
## Create a secret from literal values
kubectl create secret generic db-secret \
--from-literal=username=admin \
--from-literal=password=secret123 \
-n test-imperative
## Generate YAML without creating the resource (useful for bootstrapping manifests)
kubectl create deployment nginx-gen \
--image=nginx:1.25-alpine \
--replicas=3 \
--dry-run=client -o yaml > generated-deployment.yaml
## Clean up
kubectl delete namespace test-imperative
rm -f generated-deployment.yaml
kubectl apply (declarative)¶
## Apply a single manifest file
kubectl apply -f manifests/sample-deployment.yaml
## Apply all manifests in a directory
kubectl apply -f manifests/
## Apply with a specific namespace override
kubectl apply -f manifests/sample-deployment.yaml -n kubectl-lab
## Apply from a URL
## kubectl apply -f https://raw.githubusercontent.com/example/repo/main/manifest.yaml
## Apply and record the command in the annotation (deprecated but still used)
kubectl apply -f manifests/sample-deployment.yaml --record
## Server-side apply (recommended for CI/CD and controllers)
## Server-side apply tracks field ownership and prevents conflicts
kubectl apply -f manifests/sample-deployment.yaml --server-side
## Server-side apply with a custom field manager name
kubectl apply -f manifests/sample-deployment.yaml \
--server-side \
--field-manager=my-ci-pipeline
## Force-apply to resolve conflicts (use with caution)
kubectl apply -f manifests/sample-deployment.yaml \
--server-side \
--force-conflicts
Server-Side Apply vs Client-Side Apply
Client-side apply (default) uses the kubectl.kubernetes.io/last-applied-configuration annotation to compute diffs. Server-side apply (SSA) uses field ownership tracking in the API server. SSA is more reliable and is the recommended approach for CI/CD pipelines and controllers. SSA allows multiple managers to own different fields of the same resource without conflicts.
Dry-run modes¶
## Client-side dry run - validates locally, does NOT contact the API server
## Catches YAML syntax errors but NOT schema violations
kubectl apply -f manifests/sample-deployment.yaml --dry-run=client
## Server-side dry run - sends the request to the API server but does NOT persist
## Catches schema violations, admission webhook rejections, and RBAC issues
kubectl apply -f manifests/sample-deployment.yaml --dry-run=server
## Combine dry-run with output to see what would be created
kubectl apply -f manifests/sample-deployment.yaml --dry-run=server -o yaml
## Generate manifests with create and dry-run (great for bootstrapping)
kubectl create deployment test-gen \
--image=nginx:latest \
--replicas=3 \
--dry-run=client -o yaml
Always prefer –dry-run=server over –dry-run=client
Client-side dry-run only validates YAML syntax. Server-side dry-run sends the request to the API server, which validates the schema, runs admission webhooks, and checks RBAC - without actually creating the resource. This catches many more errors.
kubectl diff¶
## Show what would change if you applied a manifest
## (similar to 'git diff' - shows additions, removals, changes)
kubectl diff -f manifests/sample-deployment.yaml
## Diff all manifests in a directory
kubectl diff -f manifests/
## Diff with server-side apply
kubectl diff -f manifests/sample-deployment.yaml --server-side
kubectl replace¶
## Replace replaces the ENTIRE resource (not a merge)
## First export the current state
kubectl get deployment nginx-lab -n kubectl-lab -o yaml > /tmp/nginx-lab.yaml
## Modify the file, then replace
## Any fields not in the file will be REMOVED from the resource
kubectl replace -f /tmp/nginx-lab.yaml
## Replace with --force (deletes and re-creates the resource)
## WARNING: this causes downtime!
kubectl replace -f /tmp/nginx-lab.yaml --force
## Clean up temp file
rm -f /tmp/nginx-lab.yaml
Step 07 - kubectl patch¶
kubectl patchmodifies a resource in-place without replacing the entire object. There are three patch types, each with different merge semantics.
Deploy the patch target¶
## Deploy the resource we will use for patching exercises
kubectl apply -f manifests/patch-examples.yaml
## Verify the initial state
kubectl get deployment patch-target -n kubectl-lab -o yaml | head -40
Patch type 1: Strategic Merge Patch (default)¶
- This is the default and most commonly used patch type. It is Kubernetes-specific and understands how to merge lists (e.g., containers, volumes) by their merge key (usually
name).
## Add an annotation using strategic merge patch
kubectl patch deployment patch-target -n kubectl-lab \
--type=strategic \
-p '{"metadata":{"annotations":{"patched-by":"strategic-merge"}}}'
## Update the replica count
kubectl patch deployment patch-target -n kubectl-lab \
-p '{"spec":{"replicas":3}}'
## Add a new container to the pod spec
## Strategic merge patch uses the container "name" as the merge key,
## so this ADDS a sidecar without removing the existing nginx container
kubectl patch deployment patch-target -n kubectl-lab \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"sidecar","image":"busybox:1.36","command":["sh","-c","while true; do echo sidecar; sleep 60; done"]}]}}}}'
## Update an existing container's image (matched by name)
kubectl patch deployment patch-target -n kubectl-lab \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"nginx","image":"nginx:1.25-alpine"}]}}}}'
## Verify the patch was applied
kubectl get deployment patch-target -n kubectl-lab -o jsonpath='{.spec.template.spec.containers[*].name}'
Patch type 2: JSON Merge Patch (RFC 7386)¶
- JSON merge patch is simpler but less powerful. It completely replaces lists instead of merging them.
## Reset the deployment first
kubectl apply -f manifests/patch-examples.yaml
## Add an annotation using JSON merge patch
kubectl patch deployment patch-target -n kubectl-lab \
--type=merge \
-p '{"metadata":{"annotations":{"patched-by":"json-merge"}}}'
## Update the image
kubectl patch deployment patch-target -n kubectl-lab \
--type=merge \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"nginx","image":"nginx:1.25-alpine"}]}}}}'
JSON Merge Patch replaces arrays entirely
If you use --type=merge and specify a containers list with only one container, it will replace the entire containers array. Any containers not in your patch will be removed. This is why strategic merge patch is the default for Kubernetes resources.
Patch type 3: JSON Patch (RFC 6902)¶
- JSON Patch uses an array of operations (
add,remove,replace,move,copy,test) with explicit paths. It is the most precise patch type.
## Reset the deployment
kubectl apply -f manifests/patch-examples.yaml
## Add an annotation using JSON Patch
kubectl patch deployment patch-target -n kubectl-lab \
--type=json \
-p '[{"op":"add","path":"/metadata/annotations/patched-by","value":"json-patch"}]'
## Replace the replica count using JSON Patch
kubectl patch deployment patch-target -n kubectl-lab \
--type=json \
-p '[{"op":"replace","path":"/spec/replicas","value":4}]'
## Add a new label using JSON Patch
kubectl patch deployment patch-target -n kubectl-lab \
--type=json \
-p '[{"op":"add","path":"/metadata/labels/patched","value":"true"}]'
## Remove an annotation using JSON Patch
kubectl patch deployment patch-target -n kubectl-lab \
--type=json \
-p '[{"op":"remove","path":"/metadata/annotations/patched-by"}]'
## Multiple operations in a single patch
kubectl patch deployment patch-target -n kubectl-lab \
--type=json \
-p '[
{"op":"replace","path":"/spec/replicas","value":2},
{"op":"add","path":"/metadata/labels/multi-patched","value":"true"},
{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"nginx:1.25-alpine"}
]'
## Test operation - succeeds only if the value matches (useful for conditional updates)
## This will fail if the replicas is not 2
kubectl patch deployment patch-target -n kubectl-lab \
--type=json \
-p '[
{"op":"test","path":"/spec/replicas","value":2},
{"op":"replace","path":"/spec/replicas","value":5}
]'
Patch comparison summary¶
| Feature | Strategic Merge | JSON Merge | JSON Patch |
|---|---|---|---|
| Flag | --type=strategic |
--type=merge |
--type=json |
| List handling | Merges by key (e.g. name) | Replaces entire list | Operates on specific indices |
| Can delete fields | Set to null |
Set to null |
Use "op":"remove" |
| Kubernetes-aware | Yes | No | No |
| Best for | Most K8s resources | Simple updates | Surgical precision |
Step 08 - exec, cp, port-forward, attach, debug¶
- These commands are your interactive debugging toolkit for troubleshooting running pods.
Deploy the multi-container pod¶
## Deploy the multi-container pod for debugging exercises
kubectl apply -f manifests/multi-container-pod.yaml
## Wait for the pod to be ready
kubectl wait --for=condition=Ready pod/multi-container-pod \
-n kubectl-lab --timeout=120s
kubectl exec¶
## Execute a command in the default container
kubectl exec multi-container-pod -n kubectl-lab -- ls /
## Execute a command in a specific container
kubectl exec multi-container-pod -n kubectl-lab -c app -- nginx -v
## Execute a command in the sidecar container
kubectl exec multi-container-pod -n kubectl-lab -c sidecar -- cat /var/log/app/sidecar.log
## Open an interactive shell in the default container
kubectl exec -it multi-container-pod -n kubectl-lab -- /bin/sh
## Open an interactive shell in a specific container
kubectl exec -it multi-container-pod -n kubectl-lab -c sidecar -- /bin/sh
## Run multiple commands using sh -c
kubectl exec multi-container-pod -n kubectl-lab -- \
sh -c 'echo "Hostname: $(hostname)" && echo "Date: $(date)"'
## Check environment variables
kubectl exec multi-container-pod -n kubectl-lab -- env
## Check network connectivity from inside the pod
kubectl exec multi-container-pod -n kubectl-lab -- \
sh -c 'wget -qO- --timeout=5 http://nginx-lab.kubectl-lab.svc.cluster.local || echo "Service not reachable"'
## Check DNS resolution inside the pod
kubectl exec multi-container-pod -n kubectl-lab -- \
sh -c 'nslookup kubernetes.default.svc.cluster.local 2>/dev/null || cat /etc/resolv.conf'
kubectl logs¶
## View logs from the default container
kubectl logs multi-container-pod -n kubectl-lab
## View logs from a specific container
kubectl logs multi-container-pod -n kubectl-lab -c app
kubectl logs multi-container-pod -n kubectl-lab -c sidecar
## Follow logs in real time (like tail -f)
kubectl logs -f multi-container-pod -n kubectl-lab -c sidecar
## Show logs from the last 10 lines
kubectl logs --tail=10 multi-container-pod -n kubectl-lab -c sidecar
## Show logs from the last 30 seconds
kubectl logs --since=30s multi-container-pod -n kubectl-lab -c sidecar
## Show logs from all containers in a pod
kubectl logs multi-container-pod -n kubectl-lab --all-containers=true
## Show logs from all pods with a specific label
kubectl logs -l app=nginx-lab -n kubectl-lab
## Show logs with timestamps
kubectl logs --timestamps multi-container-pod -n kubectl-lab -c sidecar
## Show previous container logs (if the container crashed and restarted)
kubectl logs multi-container-pod -n kubectl-lab -c app --previous 2>/dev/null || \
echo "No previous container logs (container has not restarted)"
kubectl cp¶
## Copy a file FROM a pod to local machine
kubectl exec multi-container-pod -n kubectl-lab -c app -- \
sh -c 'echo "Hello from pod" > /tmp/test.txt'
kubectl cp kubectl-lab/multi-container-pod:/tmp/test.txt /tmp/from-pod.txt -c app
cat /tmp/from-pod.txt
## Copy a file TO a pod from local machine
echo "Hello from local" > /tmp/to-pod.txt
kubectl cp /tmp/to-pod.txt kubectl-lab/multi-container-pod:/tmp/to-pod.txt -c app
kubectl exec multi-container-pod -n kubectl-lab -c app -- cat /tmp/to-pod.txt
## Copy a directory from a pod
kubectl cp kubectl-lab/multi-container-pod:/etc/nginx /tmp/nginx-config -c app
## Clean up temp files
rm -f /tmp/from-pod.txt /tmp/to-pod.txt
rm -rf /tmp/nginx-config
kubectl cp requires tar in the container
kubectl cp uses tar under the hood. If the container does not have tar installed (e.g., distroless images), cp will fail. In that case, use kubectl exec with I/O redirection instead: kubectl exec pod -- cat /path/to/file > local-file.
kubectl port-forward¶
## Forward local port 8080 to pod port 80
kubectl port-forward multi-container-pod 8080:80 -n kubectl-lab &
## Test it: curl http://localhost:8080
## Kill the port-forward: kill %1
## Forward to a service (distributes across pods)
kubectl port-forward service/nginx-lab 8080:80 -n kubectl-lab &
## Test it: curl http://localhost:8080
## Kill the port-forward: kill %1
## Forward to a deployment (picks one pod)
kubectl port-forward deployment/nginx-lab 8080:80 -n kubectl-lab &
## Kill the port-forward: kill %1
## Forward multiple ports
kubectl port-forward multi-container-pod 8080:80 9090:9090 -n kubectl-lab &
## Kill the port-forward: kill %1
## Bind to all interfaces (not just localhost) - useful in VMs/containers
kubectl port-forward --address 0.0.0.0 multi-container-pod 8080:80 -n kubectl-lab &
## Kill the port-forward: kill %1
kubectl attach¶
## Attach to a running container's stdout (read-only by default)
## This is different from exec - attach connects to the main process
kubectl attach multi-container-pod -n kubectl-lab -c sidecar
## Attach with stdin enabled (for interactive processes)
## kubectl attach -it <pod> -c <container> -n <namespace>
## Press Ctrl+C to detach
kubectl debug¶
## Create an ephemeral debug container in a running pod
## This adds a temporary container with debugging tools
kubectl debug multi-container-pod -n kubectl-lab \
--image=busybox:1.36 \
-it \
--target=app \
-- /bin/sh
## Debug by creating a copy of the pod (original pod is untouched)
kubectl debug multi-container-pod -n kubectl-lab \
--image=busybox:1.36 \
--copy-to=debug-pod \
-it \
-- /bin/sh
## Debug a node by creating a privileged pod on it
## This gives you access to the node's filesystem at /host
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl debug node/${NODE_NAME} -it --image=busybox:1.36
## Clean up debug pods
kubectl delete pod debug-pod -n kubectl-lab --ignore-not-found
Ephemeral containers vs copy-to
Ephemeral containers (--target=app) run inside the same pod and share namespaces with the target container. This means they can see the same network, PIDs, and filesystem mounts. The --copy-to flag creates a new pod which is a clone - useful when you want to debug without affecting the running pod.
Step 09 - wait, rollout, autoscale¶
- These operational commands help you manage the lifecycle of deployments and wait for conditions.
kubectl wait¶
## Wait for a deployment to be available
kubectl wait --for=condition=available deployment/nginx-lab \
-n kubectl-lab --timeout=120s
## Wait for a pod to be ready
kubectl wait --for=condition=Ready pod -l app=nginx-lab \
-n kubectl-lab --timeout=60s
## Wait for a pod to be deleted
## (start a deletion in another terminal first)
## kubectl wait --for=delete pod/<pod-name> -n kubectl-lab --timeout=60s
## Wait for a job to complete
## kubectl wait --for=condition=complete job/<job-name> -n kubectl-lab --timeout=300s
## Wait for all pods in a namespace to be ready
kubectl wait --for=condition=Ready pods --all -n kubectl-lab --timeout=120s
## Wait using a JSONPath expression
## Wait until a deployment has the desired replicas available
kubectl wait --for=jsonpath='{.status.availableReplicas}'=3 \
deployment/nginx-lab -n kubectl-lab --timeout=120s
kubectl rollout¶
## Check the rollout status of a deployment
kubectl rollout status deployment/nginx-lab -n kubectl-lab
## View the rollout history (shows revisions)
kubectl rollout history deployment/nginx-lab -n kubectl-lab
## View details of a specific revision
kubectl rollout history deployment/nginx-lab -n kubectl-lab --revision=1
## Trigger a new rollout by updating the image
kubectl set image deployment/nginx-lab nginx=nginx:1.25-alpine -n kubectl-lab
## Watch the rollout progress
kubectl rollout status deployment/nginx-lab -n kubectl-lab --watch
## Pause a rollout (prevents further updates from being applied)
kubectl rollout pause deployment/nginx-lab -n kubectl-lab
## Resume a paused rollout
kubectl rollout resume deployment/nginx-lab -n kubectl-lab
## Undo the last rollout (rollback to previous revision)
kubectl rollout undo deployment/nginx-lab -n kubectl-lab
## Rollback to a specific revision
kubectl rollout undo deployment/nginx-lab -n kubectl-lab --to-revision=1
## Restart all pods in a deployment (rolling restart)
## This is useful to pick up configmap/secret changes
kubectl rollout restart deployment/nginx-lab -n kubectl-lab
kubectl autoscale¶
## Create a Horizontal Pod Autoscaler (HPA)
## Scales between 2 and 10 replicas based on CPU utilization
kubectl autoscale deployment nginx-lab -n kubectl-lab \
--min=2 --max=10 --cpu-percent=50
## Check the HPA status
kubectl get hpa -n kubectl-lab
## Describe the HPA for detailed metrics and conditions
kubectl describe hpa nginx-lab -n kubectl-lab
## Delete the HPA when done
kubectl delete hpa nginx-lab -n kubectl-lab
HPA requires metrics-server
The Horizontal Pod Autoscaler needs a metrics source. For CPU/memory-based autoscaling, you need metrics-server installed in the cluster. For custom metrics, you need a custom metrics adapter (e.g., Prometheus Adapter).
Step 10 - kubectl auth¶
- The
authsubcommand lets you inspect and debug RBAC (Role-Based Access Control) permissions.
Deploy RBAC resources¶
## Deploy the RBAC demo resources
kubectl apply -f manifests/rbac-demo.yaml
## Verify the resources were created
kubectl get serviceaccount,role,rolebinding -n kubectl-lab -l app=rbac-demo
kubectl auth can-i¶
## Check if your current user can perform an action
kubectl auth can-i create pods -n kubectl-lab
kubectl auth can-i delete deployments -n kubectl-lab
kubectl auth can-i get secrets -n kubectl-lab
kubectl auth can-i create clusterroles
## Check what a specific service account can do
kubectl auth can-i list pods \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## The lab-viewer service account should be able to list pods
kubectl auth can-i get pods \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## The lab-viewer service account should NOT be able to create pods
kubectl auth can-i create pods \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## The lab-restricted service account has NO permissions
kubectl auth can-i list pods \
--as=system:serviceaccount:kubectl-lab:lab-restricted \
-n kubectl-lab
## Check all permissions (list everything the user can do)
kubectl auth can-i --list -n kubectl-lab
## Check all permissions for a service account
kubectl auth can-i --list \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## Check access to a specific resource by name
kubectl auth can-i get pods/nginx-lab \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## Check access to subresources (like pod logs)
kubectl auth can-i get pods/log \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
kubectl auth whoami¶
## Show the current user information (requires K8s 1.27+)
kubectl auth whoami 2>/dev/null || echo "whoami requires Kubernetes 1.27+"
## Show whoami output as YAML
kubectl auth whoami -o yaml 2>/dev/null || echo "whoami requires Kubernetes 1.27+"
Inspecting RBAC resources directly¶
## List all roles in the namespace
kubectl get roles -n kubectl-lab
## List all role bindings in the namespace
kubectl get rolebindings -n kubectl-lab
## Get the full YAML of a role (to see its rules)
kubectl get role pod-reader -n kubectl-lab -o yaml
## List cluster-level roles (there are many built-in ones)
kubectl get clusterroles | head -20
## Show details of the built-in admin ClusterRole
kubectl get clusterrole admin -o yaml
## List all cluster role bindings
kubectl get clusterrolebindings | head -20
Step 11 - kubectl Plugins and Krew¶
kubectlcan be extended with plugins. Any executable in yourPATHnamedkubectl-*becomes a kubectl subcommand.
How plugins work¶
## kubectl discovers plugins automatically
## Any executable named "kubectl-<name>" in your PATH becomes "kubectl <name>"
## List all installed plugins
kubectl plugin list
## Example: create a simple plugin
## (This creates a script that shows pod count per namespace)
cat > /tmp/kubectl-pod-count << 'SCRIPT'
#!/bin/bash
## kubectl pod-count - Shows the number of pods in each namespace
echo "NAMESPACE POD_COUNT"
echo "------------------- ---------"
kubectl get pods --all-namespaces --no-headers 2>/dev/null | \
awk '{count[$1]++} END {for (ns in count) printf "%-20s %d\n", ns, count[ns]}' | \
sort
SCRIPT
chmod +x /tmp/kubectl-pod-count
## Add to PATH and test it
export PATH=$PATH:/tmp
kubectl pod-count
## Clean up
rm -f /tmp/kubectl-pod-count
Installing Krew (kubectl plugin manager)¶
## Install Krew using the official installer
(
set -x; cd "$(mktemp -d)" &&
OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/aarch64/arm64/')" &&
KREW="krew-${OS}_${ARCH}" &&
curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
tar zxvf "${KREW}.tar.gz" &&
./"${KREW}" install krew
)
## Add krew to your PATH (add to ~/.bashrc or ~/.zshrc for persistence)
export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
Using Krew¶
## Update the krew plugin index
kubectl krew update
## Search for plugins
kubectl krew search
## Search for a specific plugin
kubectl krew search ctx
kubectl krew search ns
## Install popular plugins
kubectl krew install ctx ## Switch contexts easily
kubectl krew install ns ## Switch namespaces easily
kubectl krew install neat ## Remove clutter from kubectl output
kubectl krew install tree ## Show resource ownership tree
kubectl krew install images ## Show container images in use
kubectl krew install access-matrix ## Show RBAC access matrix
kubectl krew install who-can ## Show who can perform an action
## Use installed plugins
kubectl ctx ## List and switch contexts
kubectl ns ## List and switch namespaces
kubectl neat get pod -n kubectl-lab -l app=nginx-lab -o yaml ## Clean YAML output
kubectl tree deployment nginx-lab -n kubectl-lab ## Resource hierarchy
## Show info about a plugin
kubectl krew info ctx
## Uninstall a plugin
kubectl krew uninstall ctx
## List installed plugins
kubectl krew list
Essential krew plugins
The most useful plugins for daily work are: ctx (context switching), ns (namespace switching), neat (clean YAML output), tree (resource hierarchy), images (list container images), who-can (RBAC query), and stern (multi-pod log tailing).
Step 12 - kubectl proxy, Raw API Calls, and Token-Based Access¶
- Sometimes you need to bypass kubectl and talk directly to the Kubernetes API. This section covers three ways to do that.
kubectl proxy¶
## Start the kubectl proxy (runs in background on port 8001)
## The proxy handles authentication for you
kubectl proxy --port=8001 &
## Now you can make unauthenticated HTTP requests to the API server
## List all API versions
curl -s http://localhost:8001/api | jq .
## List all available API groups
curl -s http://localhost:8001/apis | jq '.groups[].name'
## List pods in the kubectl-lab namespace
curl -s http://localhost:8001/api/v1/namespaces/kubectl-lab/pods | jq '.items[].metadata.name'
## Get a specific deployment
curl -s http://localhost:8001/apis/apps/v1/namespaces/kubectl-lab/deployments/nginx-lab | jq '.metadata.name'
## List all services in all namespaces
curl -s http://localhost:8001/api/v1/services | jq '.items[] | {namespace: .metadata.namespace, name: .metadata.name}'
## List all nodes
curl -s http://localhost:8001/api/v1/nodes | jq '.items[].metadata.name'
## Get cluster health endpoints
curl -s http://localhost:8001/healthz
curl -s http://localhost:8001/readyz
curl -s http://localhost:8001/livez
## Stop the proxy
kill %1 2>/dev/null
kubectl raw API calls¶
## kubectl get --raw sends a raw GET request and returns the raw response
## This is useful for accessing API endpoints that kubectl doesn't have a command for
## Get the API server version
kubectl get --raw /version | jq .
## List API discovery document
kubectl get --raw /api/v1 | jq '.resources[].name' | head -20
## Get metrics (requires metrics-server)
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes 2>/dev/null | jq . || \
echo "metrics-server not installed"
## Get pod metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/kubectl-lab/pods 2>/dev/null | jq . || \
echo "metrics-server not installed"
## Health check endpoints
kubectl get --raw /healthz
kubectl get --raw /readyz
kubectl get --raw /livez
Token-based access (without kubectl)¶
## Get the API server URL from kubeconfig
APISERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')
echo "API Server: ${APISERVER}"
## Create a service account token for direct API access
TOKEN=$(kubectl create token lab-viewer -n kubectl-lab --duration=10m 2>/dev/null)
## If the create token command is not available (K8s < 1.24), use:
## TOKEN=$(kubectl get secret -n kubectl-lab -o jsonpath='{.items[?(@.type=="kubernetes.io/service-account-token")].data.token}' | base64 -d)
if [ -n "${TOKEN}" ]; then
## Make an API call using the token
## --insecure is used here for lab purposes; in production, use the CA cert
curl -s --insecure \
-H "Authorization: Bearer ${TOKEN}" \
"${APISERVER}/api/v1/namespaces/kubectl-lab/pods" | jq '.items[].metadata.name'
## This should fail because lab-viewer can only read pods, not deployments
curl -s --insecure \
-H "Authorization: Bearer ${TOKEN}" \
"${APISERVER}/apis/apps/v1/namespaces/kubectl-lab/deployments" | jq '.message'
else
echo "Could not create token. Skipping token-based access exercise."
fi
TLS certificates in production
The examples above use --insecure to skip TLS verification for lab purposes. In production, always use the CA certificate: curl --cacert /path/to/ca.crt -H "Authorization: Bearer $TOKEN" https://...
Step 13 - Performance Tips and Productivity¶
- This final step covers techniques that make you faster and more efficient with kubectl.
Bash/Zsh completion¶
Useful aliases¶
## Add these to your ~/.bashrc or ~/.zshrc
## Basic alias
alias k='kubectl'
## Get commands
alias kg='kubectl get'
alias kgp='kubectl get pods'
alias kgd='kubectl get deployments'
alias kgs='kubectl get services'
alias kgn='kubectl get nodes'
alias kga='kubectl get all'
## Get with options
alias kgpw='kubectl get pods -o wide'
alias kgpa='kubectl get pods --all-namespaces'
## Describe
alias kd='kubectl describe'
alias kdp='kubectl describe pod'
alias kdd='kubectl describe deployment'
## Apply and delete
alias ka='kubectl apply -f'
alias kdel='kubectl delete'
## Logs
alias kl='kubectl logs'
alias klf='kubectl logs -f'
## Exec
alias kex='kubectl exec -it'
## Context and namespace
alias kctx='kubectl config get-contexts'
alias kns='kubectl config set-context --current --namespace'
Watch mode¶
## Watch pods continuously (updates in place)
kubectl get pods -n kubectl-lab --watch
## Short form
kubectl get pods -n kubectl-lab -w
## Watch with wide output
kubectl get pods -n kubectl-lab -o wide -w
## Watch events as they happen
kubectl get events -n kubectl-lab --watch
## Watch a specific resource
kubectl get deployment nginx-lab -n kubectl-lab -w
watch vs –watch
kubectl get pods --watch uses the Kubernetes watch API, which is efficient - the API server pushes updates. The watch kubectl get pods command (using the Unix watch utility) re-runs the full GET request every 2 seconds, which is less efficient but works with any output format.
Resource caching and fast lookups¶
## kubectl caches API discovery information locally
## Force refresh the discovery cache
kubectl api-resources --cached=false > /dev/null
## The discovery cache is stored at:
## ~/.kube/cache/discovery/
## For frequently repeated commands in scripts, use --request-timeout
kubectl get pods -n kubectl-lab --request-timeout=5s
## Use --chunk-size to paginate large result sets
kubectl get pods --all-namespaces --chunk-size=100
Quick YAML generation with –dry-run¶
## Generate deployment YAML without creating it
kubectl create deployment quick-nginx \
--image=nginx:alpine \
--replicas=3 \
--dry-run=client -o yaml
## Generate service YAML
kubectl create service clusterip my-service \
--tcp=80:8080 \
--dry-run=client -o yaml
## Generate job YAML
kubectl create job my-job \
--image=busybox \
--dry-run=client -o yaml -- echo "Hello"
## Generate cronjob YAML
kubectl create cronjob my-cronjob \
--image=busybox \
--schedule="0 * * * *" \
--dry-run=client -o yaml -- echo "Hello"
## Generate configmap YAML
kubectl create configmap my-config \
--from-literal=key=value \
--dry-run=client -o yaml
## Generate secret YAML
kubectl create secret generic my-secret \
--from-literal=password=s3cr3t \
--dry-run=client -o yaml
## Generate namespace YAML
kubectl create namespace my-ns \
--dry-run=client -o yaml
## Generate serviceaccount YAML
kubectl create serviceaccount my-sa \
--dry-run=client -o yaml
## Generate role YAML
kubectl create role my-role \
--verb=get,list \
--resource=pods \
--dry-run=client -o yaml
## Generate rolebinding YAML
kubectl create rolebinding my-binding \
--role=my-role \
--serviceaccount=default:my-sa \
--dry-run=client -o yaml
kubectl top (resource usage)¶
## Show resource usage for nodes (requires metrics-server)
kubectl top nodes 2>/dev/null || echo "metrics-server not installed"
## Show resource usage for pods
kubectl top pods -n kubectl-lab 2>/dev/null || echo "metrics-server not installed"
## Sort by CPU usage
kubectl top pods -n kubectl-lab --sort-by=cpu 2>/dev/null || echo "metrics-server not installed"
## Sort by memory usage
kubectl top pods -n kubectl-lab --sort-by=memory 2>/dev/null || echo "metrics-server not installed"
## Show container-level resource usage
kubectl top pods -n kubectl-lab --containers 2>/dev/null || echo "metrics-server not installed"
## Show resource usage across all namespaces
kubectl top pods -A --sort-by=cpu 2>/dev/null || echo "metrics-server not installed"
Exercises¶
The following exercises will test your deep understanding of kubectl.
Try to solve each exercise on your own before revealing the solution.
01. Get All Non-Running Pods¶
Get all pods across all namespaces that are NOT in Running state using a field selector.
Scenario:¶
β¦ You are troubleshooting a cluster and need to quickly find all pods that are not healthy. β¦ Field selectors let you filter server-side, which is more efficient than client-side filtering.
Hint: Use --field-selector with the status.phase field and the != operator.
Solution
## Get all pods that are NOT in Running state across all namespaces
## The field selector status.phase!=Running filters on the API server side
kubectl get pods --all-namespaces \
--field-selector status.phase!=Running
## For more detail, add wide output
kubectl get pods --all-namespaces \
--field-selector status.phase!=Running \
-o wide
## Combine with other phases to be more specific
## Find only Failed pods
kubectl get pods --all-namespaces \
--field-selector status.phase=Failed
## Find Pending pods (often indicates scheduling issues)
kubectl get pods --all-namespaces \
--field-selector status.phase=Pending
02. Extract All Container Images¶
Extract all unique container images running in the cluster using JSONPath.
Scenario:¶
β¦ You need to audit which container images are deployed across the cluster for security scanning. β¦ JSONPath combined with shell tools gives you a powerful extraction pipeline.
Hint: Use -o jsonpath to extract .spec.containers[*].image from all pods.
Solution
## Method 1: JSONPath with tr and sort
## Get all container images from all pods, deduplicate and sort
kubectl get pods --all-namespaces \
-o jsonpath='{.items[*].spec.containers[*].image}' | \
tr ' ' '\n' | sort -u
## Method 2: JSONPath range for cleaner output
kubectl get pods --all-namespaces \
-o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}{end}{end}' | \
sort -u
## Method 3: Include init containers too (for a complete audit)
echo "=== Regular Containers ==="
kubectl get pods --all-namespaces \
-o jsonpath='{.items[*].spec.containers[*].image}' | tr ' ' '\n' | sort -u
echo ""
echo "=== Init Containers ==="
kubectl get pods --all-namespaces \
-o jsonpath='{.items[*].spec.initContainers[*].image}' | tr ' ' '\n' | sort -u
## Method 4: custom-columns approach
kubectl get pods --all-namespaces \
-o custom-columns='IMAGE:.spec.containers[*].image' --no-headers | \
tr ',' '\n' | sort -u
03. Custom Columns: Pod Dashboard¶
Use custom-columns to display pod name, node, status, and restart count in a clean table.
Scenario:¶
β¦ You want a quick dashboard view of pod health without the noise of full get -o wide output.
β¦ Custom columns let you select exactly the fields you care about.
Hint: Use -o custom-columns with JSONPath expressions for each column.
Solution
## Custom columns showing name, node, status, restart count, and age
kubectl get pods -n kubectl-lab \
-o custom-columns='\
NAME:.metadata.name,\
NODE:.spec.nodeName,\
STATUS:.status.phase,\
RESTARTS:.status.containerStatuses[0].restartCount,\
IP:.status.podIP'
## Extended version with container image
kubectl get pods -n kubectl-lab \
-o custom-columns='\
NAME:.metadata.name,\
NODE:.spec.nodeName,\
STATUS:.status.phase,\
RESTARTS:.status.containerStatuses[0].restartCount,\
IMAGE:.spec.containers[0].image,\
IP:.status.podIP'
## All namespaces with namespace column
kubectl get pods --all-namespaces \
-o custom-columns='\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
STATUS:.status.phase,\
RESTARTS:.status.containerStatuses[0].restartCount,\
NODE:.spec.nodeName'
04. Switch Kubeconfig Contexts¶
Switch between multiple kubeconfig contexts and verify which cluster is active after each switch.
Scenario:¶
β¦ You manage multiple clusters (dev, staging, production) and need to switch between them safely. β¦ Always verify the active context before running commands to avoid accidental changes to the wrong cluster.
Hint: Use kubectl config get-contexts, kubectl config use-context, and kubectl config current-context.
Solution
## Step 1: List all available contexts
kubectl config get-contexts
## Step 2: Note the current context (the one marked with *)
kubectl config current-context
## Step 3: Switch to a different context (replace with your actual context name)
## kubectl config use-context <other-context-name>
## Step 4: Verify the switch
kubectl config current-context
## Step 5: Verify the cluster you are now connected to
kubectl cluster-info
## Step 6: Switch back to the original context
## kubectl config use-context <original-context-name>
## Step 7: Use --context flag for one-off commands without switching
## kubectl get pods --context=<other-context-name>
## Pro tip: use kubectl config rename-context for clearer names
## kubectl config rename-context old-name new-name
05. Preview Changes with kubectl diff¶
Use kubectl diff to preview what changes would be applied to the cluster before actually applying a manifest.
Scenario:¶
β¦ Before applying changes to production, you want to see exactly what will change.
β¦ kubectl diff shows a unified diff between the live cluster state and the local manifest.
Hint: Modify a field in the local manifest (e.g., replica count) and run kubectl diff -f <file>.
Solution
## Step 1: Make sure the deployment is applied
kubectl apply -f manifests/sample-deployment.yaml
## Step 2: Create a modified version of the manifest
cp manifests/sample-deployment.yaml /tmp/modified-deployment.yaml
## Step 3: Change the replica count in the modified file (from 3 to 5)
sed -i.bak 's/replicas: 3/replicas: 5/' /tmp/modified-deployment.yaml
## Step 4: Use kubectl diff to preview the changes
## Lines prefixed with - are the current state, + are the proposed changes
kubectl diff -f /tmp/modified-deployment.yaml
## Step 5: If the diff looks good, apply it
## kubectl apply -f /tmp/modified-deployment.yaml
## Step 6: You can also diff an entire directory
kubectl diff -f manifests/
## Clean up
rm -f /tmp/modified-deployment.yaml /tmp/modified-deployment.yaml.bak
## Restore original state
kubectl apply -f manifests/sample-deployment.yaml
06. Patch a Deployment with Strategic Merge Patch¶
Use kubectl patch with a strategic merge patch to add an annotation and update the deployment’s labels without affecting other fields.
Scenario:¶
β¦ You need to add monitoring annotations to existing deployments without redeploying. β¦ Strategic merge patch is the safest option because it merges intelligently.
Hint: Use kubectl patch with --type=strategic (or omit --type as it is the default) and a JSON payload.
Solution
## Add annotations to the deployment
kubectl patch deployment nginx-lab -n kubectl-lab \
-p '{
"metadata": {
"annotations": {
"monitoring.example.com/enabled": "true",
"monitoring.example.com/port": "80"
}
}
}'
## Verify the annotations were added
kubectl get deployment nginx-lab -n kubectl-lab \
-o jsonpath='{.metadata.annotations}' | jq .
## Add a new label without removing existing ones
kubectl patch deployment nginx-lab -n kubectl-lab \
-p '{"metadata":{"labels":{"patched":"true"}}}'
## Verify the labels (all original labels should still be present)
kubectl get deployment nginx-lab -n kubectl-lab \
-o jsonpath='{.metadata.labels}' | jq .
## Clean up the patch - remove the added annotation
kubectl patch deployment nginx-lab -n kubectl-lab \
--type=json \
-p '[{"op":"remove","path":"/metadata/annotations/monitoring.example.com~1enabled"},
{"op":"remove","path":"/metadata/annotations/monitoring.example.com~1port"}]'
07. Debug with an Ephemeral Container¶
Use kubectl debug to attach an ephemeral container to a running pod and inspect its network connectivity and filesystem.
Scenario:¶
β¦ A pod is experiencing network issues, but its container does not have debugging tools installed. β¦ Ephemeral containers let you inject a debugging container into the running pod without restarting it.
Hint: Use kubectl debug <pod> --image=busybox --target=<container> -it -- /bin/sh.
Solution
## Get the name of a running pod
POD_NAME=$(kubectl get pods -n kubectl-lab -l app=nginx-lab \
-o jsonpath='{.items[0].metadata.name}')
echo "Debugging pod: ${POD_NAME}"
## Attach an ephemeral container to the running pod
## --target=nginx shares the PID namespace with the nginx container
kubectl debug "${POD_NAME}" -n kubectl-lab \
--image=busybox:1.36 \
--target=nginx \
-it -- /bin/sh
## Inside the ephemeral container, you can:
## - Check network: wget -qO- http://localhost
## - Check DNS: nslookup kubernetes.default.svc.cluster.local
## - Check processes: ps aux (if PID sharing is enabled)
## - Check filesystem: ls /proc/1/root/ (access target container's filesystem)
## - Exit: type 'exit'
## Alternative: create a copy of the pod for debugging
kubectl debug "${POD_NAME}" -n kubectl-lab \
--image=busybox:1.36 \
--copy-to=debug-copy-pod \
-it -- /bin/sh
## Clean up the debug copy pod
kubectl delete pod debug-copy-pod -n kubectl-lab --ignore-not-found
08. Server-Side Dry Run Validation¶
Use --dry-run=server to validate a manifest against the API server without actually creating the resource.
Scenario:¶
β¦ You have a YAML manifest and want to ensure it is valid against the cluster’s schema and admission webhooks before applying it. β¦ Server-side dry run catches more errors than client-side dry run.
Hint: Use kubectl apply -f <file> --dry-run=server and observe the output for validation errors.
Solution
## Step 1: Validate a correct manifest (should succeed)
kubectl apply -f manifests/sample-deployment.yaml --dry-run=server
## Output: deployment.apps/nginx-lab configured (server dry run)
## Step 2: Create an intentionally broken manifest
cat > /tmp/broken-manifest.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: broken-deploy
namespace: kubectl-lab
spec:
replicas: 3
selector:
matchLabels:
app: broken
template:
metadata:
labels:
app: broken
spec:
containers:
- name: app
image: nginx:latest
ports:
- containerPort: "eighty" ## Invalid: must be integer
EOF
## Step 3: Validate the broken manifest with server-side dry run
## This should return an error about the invalid port
kubectl apply -f /tmp/broken-manifest.yaml --dry-run=server 2>&1 || true
## Step 4: Compare with client-side dry run (may not catch the error)
kubectl apply -f /tmp/broken-manifest.yaml --dry-run=client 2>&1 || true
## Step 5: Fix the manifest and validate again
sed 's/"eighty"/80/' /tmp/broken-manifest.yaml | kubectl apply --dry-run=server -f -
## Clean up
rm -f /tmp/broken-manifest.yaml
09. Find API Resources by Verb¶
Find all Kubernetes API resources that support the “list” verb.
Scenario:¶
β¦ You are building automation and need to know which resources can be listed programmatically. β¦ Different resources support different verbs (get, list, create, update, delete, watch, patch).
Hint: Use kubectl api-resources with the --verbs flag.
Solution
## List all resources that support the "list" verb
kubectl api-resources --verbs=list
## List resources that support both "list" and "watch" (for informer-based controllers)
kubectl api-resources --verbs=list,watch
## List resources that support "create" (can be created)
kubectl api-resources --verbs=create
## List resources that can be deleted
kubectl api-resources --verbs=delete
## Find resources that support the "patch" verb
kubectl api-resources --verbs=patch
## Combine with namespace filter to show only namespaced resources that support list
kubectl api-resources --verbs=list --namespaced=true
## Count resources by API group
kubectl api-resources --verbs=list -o name | cut -d. -f2- | sort | uniq -c | sort -rn
## Find resources that DON'T support create (read-only resources)
## Compare the full list with create-supporting resources
diff <(kubectl api-resources -o name | sort) \
<(kubectl api-resources --verbs=create -o name | sort) | \
grep '^<' | sed 's/< //'
10. Check Service Account Permissions¶
Use kubectl auth can-i to check what a specific service account is allowed to do.
Scenario:¶
β¦ A team member reports that their application (running as a service account) cannot access certain resources. β¦ You need to diagnose the exact RBAC permissions.
Hint: Use kubectl auth can-i with the --as=system:serviceaccount:<namespace>:<name> flag and --list.
Solution
## Make sure RBAC resources are deployed
kubectl apply -f manifests/rbac-demo.yaml
## Check if lab-viewer can list pods (should be YES)
kubectl auth can-i list pods \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## Check if lab-viewer can create pods (should be NO)
kubectl auth can-i create pods \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## Check if lab-viewer can read pod logs (should be YES, per the Role definition)
kubectl auth can-i get pods/log \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## Check if lab-viewer can delete deployments (should be NO)
kubectl auth can-i delete deployments \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## List ALL permissions for the lab-viewer service account
kubectl auth can-i --list \
--as=system:serviceaccount:kubectl-lab:lab-viewer \
-n kubectl-lab
## Check the restricted service account (should have NO permissions)
kubectl auth can-i --list \
--as=system:serviceaccount:kubectl-lab:lab-restricted \
-n kubectl-lab
## Check if the restricted account can even list pods (should be NO)
kubectl auth can-i list pods \
--as=system:serviceaccount:kubectl-lab:lab-restricted \
-n kubectl-lab
11. Generate a CSV with Go Templates¶
Use go-template output formatting to generate a CSV of pod information (name, namespace, status, IP, node).
Scenario:¶
β¦ You need to export pod information to a spreadsheet or feed it into another tool. β¦ Go templates give you complete control over output formatting.
Hint: Use -o go-template with {{range .items}} and {{"\n"}} for newlines.
Solution
## Generate a CSV header and data for all pods in the namespace
echo "NAME,NAMESPACE,STATUS,POD_IP,NODE"
kubectl get pods -n kubectl-lab \
-o go-template='{{range .items}}{{.metadata.name}},{{.metadata.namespace}},{{.status.phase}},{{.status.podIP}},{{.spec.nodeName}}{{"\n"}}{{end}}'
## Generate CSV for all namespaces
echo "NAME,NAMESPACE,STATUS,POD_IP,NODE"
kubectl get pods --all-namespaces \
-o go-template='{{range .items}}{{.metadata.name}},{{.metadata.namespace}},{{.status.phase}},{{.status.podIP}},{{.spec.nodeName}}{{"\n"}}{{end}}'
## Generate CSV with container information
echo "POD,CONTAINER,IMAGE"
kubectl get pods -n kubectl-lab \
-o go-template='{{range .items}}{{$pod := .metadata.name}}{{range .spec.containers}}{{$pod}},{{.name}},{{.image}}{{"\n"}}{{end}}{{end}}'
## Save to a file
echo "NAME,NAMESPACE,STATUS,POD_IP,NODE" > /tmp/pods.csv
kubectl get pods --all-namespaces \
-o go-template='{{range .items}}{{.metadata.name}},{{.metadata.namespace}},{{.status.phase}},{{.status.podIP}},{{.spec.nodeName}}{{"\n"}}{{end}}' >> /tmp/pods.csv
echo "CSV saved to /tmp/pods.csv"
cat /tmp/pods.csv
## Clean up
rm -f /tmp/pods.csv
12. Wait for a Deployment Rollout¶
Use kubectl wait to block until a deployment is fully rolled out with all replicas available.
Scenario:¶
β¦ In a CI/CD pipeline, you need to wait for a deployment to be fully ready before proceeding with integration tests.
β¦ kubectl wait is designed exactly for this purpose - it exits with code 0 when the condition is met.
Hint: Use kubectl wait --for=condition=available or kubectl rollout status.
Solution
## Method 1: kubectl wait with condition
## Wait for the deployment to report the Available condition as True
kubectl wait --for=condition=available deployment/nginx-lab \
-n kubectl-lab --timeout=120s
## Method 2: kubectl rollout status (blocks until complete)
kubectl rollout status deployment/nginx-lab -n kubectl-lab
## Method 3: Wait for a specific number of ready replicas using JSONPath
kubectl wait --for=jsonpath='{.status.readyReplicas}'=3 \
deployment/nginx-lab -n kubectl-lab --timeout=120s
## Simulate a rollout and wait for it
## Step 1: Trigger a rolling update
kubectl set image deployment/nginx-lab nginx=nginx:1.25-alpine -n kubectl-lab
## Step 2: Wait for the rollout to complete
kubectl rollout status deployment/nginx-lab -n kubectl-lab --timeout=120s
echo "Rollout complete! Deployment is ready."
## Method 4: Wait for all pods to be Ready
kubectl wait --for=condition=Ready pod -l app=nginx-lab \
-n kubectl-lab --timeout=120s
## Use in a script (checking exit code)
if kubectl wait --for=condition=available deployment/nginx-lab \
-n kubectl-lab --timeout=60s; then
echo "Deployment is ready, proceeding with tests..."
else
echo "Deployment failed to become ready within timeout!"
exit 1
fi
13. Access the API via kubectl proxy¶
Use kubectl proxy to access the Kubernetes API server via a local HTTP proxy without authentication.
Scenario:¶
β¦ You want to explore the Kubernetes API using curl or a web browser for debugging or learning.
β¦ kubectl proxy handles all the authentication so you can make simple HTTP requests.
Hint: Start the proxy with kubectl proxy &, then use curl http://localhost:8001/api/....
Solution
## Start the kubectl proxy in the background on port 8001
kubectl proxy --port=8001 &
PROXY_PID=$!
echo "Proxy started with PID: ${PROXY_PID}"
## Wait a moment for the proxy to start
sleep 2
## List the core API versions
curl -s http://localhost:8001/api | jq '.versions'
## List all pods in the kubectl-lab namespace
curl -s http://localhost:8001/api/v1/namespaces/kubectl-lab/pods | \
jq '.items[].metadata.name'
## Get a specific deployment
curl -s http://localhost:8001/apis/apps/v1/namespaces/kubectl-lab/deployments/nginx-lab | \
jq '{name: .metadata.name, replicas: .spec.replicas, availableReplicas: .status.availableReplicas}'
## List all namespaces
curl -s http://localhost:8001/api/v1/namespaces | jq '.items[].metadata.name'
## Check the API server health
echo "Health: $(curl -s http://localhost:8001/healthz)"
echo "Ready: $(curl -s http://localhost:8001/readyz)"
## List all API groups
curl -s http://localhost:8001/apis | jq '.groups[].name'
## Stop the proxy
kill ${PROXY_PID}
echo "Proxy stopped."
14. Create a Custom kubectl Alias Script¶
Create a shell function or alias that shows pods alongside their resource requests and limits.
Scenario:¶
β¦ You frequently need to check pod resource allocation to debug scheduling and OOM issues. β¦ A custom alias saves you from typing complex JSONPath expressions every time.
Hint: Create a shell function that uses -o custom-columns or -o go-template with resource fields.
Solution
## Method 1: Shell function using custom-columns
kpod_resources() {
local ns="${1:---all-namespaces}"
if [ "$ns" != "--all-namespaces" ] && [ "$ns" != "-A" ]; then
ns="-n $ns"
fi
kubectl get pods ${ns} \
-o custom-columns='\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
CPU_LIM:.spec.containers[0].resources.limits.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory,\
MEM_LIM:.spec.containers[0].resources.limits.memory,\
STATUS:.status.phase'
}
## Use it for a specific namespace
kpod_resources kubectl-lab
## Use it for all namespaces
kpod_resources --all-namespaces
## Method 2: Shell function using go-template for CSV-style output
kpod_csv() {
echo "NAMESPACE,POD,CONTAINER,CPU_REQ,CPU_LIM,MEM_REQ,MEM_LIM"
kubectl get pods "${@}" \
-o go-template='{{range .items}}{{$ns := .metadata.namespace}}{{$pod := .metadata.name}}{{range .spec.containers}}{{$ns}},{{$pod}},{{.name}},{{if .resources.requests.cpu}}{{.resources.requests.cpu}}{{else}}<none>{{end}},{{if .resources.limits.cpu}}{{.resources.limits.cpu}}{{else}}<none>{{end}},{{if .resources.requests.memory}}{{.resources.requests.memory}}{{else}}<none>{{end}},{{if .resources.limits.memory}}{{.resources.limits.memory}}{{else}}<none>{{end}}{{"\n"}}{{end}}{{end}}'
}
## Use it
kpod_csv -n kubectl-lab
## To make these permanent, add them to ~/.bashrc or ~/.zshrc
15. Find Resource-Hungry Pods with kubectl top¶
Use kubectl top with --sort-by to find the most resource-hungry pods in the cluster.
Scenario:¶
β¦ The cluster is running low on resources and you need to identify which pods are consuming the most CPU and memory.
β¦ kubectl top provides real-time resource consumption data (requires metrics-server).
Hint: Use kubectl top pods --sort-by=cpu or --sort-by=memory with --all-namespaces.
Solution
## NOTE: These commands require metrics-server to be installed in the cluster
## Install metrics-server if not present:
## kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
## Find top CPU consumers across all namespaces
kubectl top pods --all-namespaces --sort-by=cpu 2>/dev/null | head -20 || \
echo "metrics-server not installed. Install it first."
## Find top memory consumers across all namespaces
kubectl top pods --all-namespaces --sort-by=memory 2>/dev/null | head -20 || \
echo "metrics-server not installed. Install it first."
## Find top CPU consumers in a specific namespace
kubectl top pods -n kubectl-lab --sort-by=cpu 2>/dev/null || \
echo "metrics-server not installed."
## Show per-container resource usage
kubectl top pods -n kubectl-lab --containers --sort-by=cpu 2>/dev/null || \
echo "metrics-server not installed."
## Node resource usage (to check overall cluster capacity)
kubectl top nodes --sort-by=cpu 2>/dev/null || \
echo "metrics-server not installed."
## Combine with other commands for a complete picture
## Show pods sorted by CPU alongside their resource requests
echo "=== Top CPU Pods ==="
kubectl top pods -n kubectl-lab --sort-by=cpu 2>/dev/null || echo "metrics-server required"
echo ""
echo "=== Pod Resource Requests ==="
kubectl get pods -n kubectl-lab \
-o custom-columns='\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory'
Finalize & Cleanup¶
- To remove all resources created by this lab, run the following commands:
## Delete all lab resources by removing the namespace
## This deletes everything inside it (pods, deployments, services, roles, etc.)
kubectl delete namespace kubectl-lab
## Verify the namespace is deleted
kubectl get namespace kubectl-lab 2>&1 || echo "Namespace kubectl-lab deleted successfully"
- If you created any temporary files during the exercises:
## Clean up any temp files created during the lab
rm -f /tmp/from-pod.txt /tmp/to-pod.txt /tmp/broken-manifest.yaml
rm -f /tmp/modified-deployment.yaml /tmp/pods.csv
rm -rf /tmp/nginx-config
rm -f /tmp/kubectl-pod-count
- If you modified your shell configuration for aliases or completions, those changes persist in your
~/.bashrcor~/.zshrcfile. They are useful to keep.
Troubleshooting¶
- kubectl: command not found:
Make sure kubectl is installed and in your PATH. Check with:
- Unable to connect to the server:
Check that your cluster is running and your kubeconfig is correct:
## Check the current context
kubectl config current-context
## Check the cluster endpoint
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'
## Try to reach the API server
kubectl cluster-info
- error: the server doesn’t have a resource type:
The resource might be in a different API group, or the CRD might not be installed:
## List all available API resources
kubectl api-resources | grep -i <resource-name>
## Check if a CRD is installed
kubectl get crd | grep -i <resource-name>
- pods not starting (Pending, CrashLoopBackOff, ImagePullBackOff):
## Check pod status and events
kubectl describe pod <pod-name> -n kubectl-lab
## Check pod logs
kubectl logs <pod-name> -n kubectl-lab
## Check events in the namespace
kubectl get events -n kubectl-lab --sort-by='.lastTimestamp'
- RBAC permission denied:
## Check what your current user can do
kubectl auth can-i --list -n kubectl-lab
## Check with verbosity to see the API call
kubectl get pods -n kubectl-lab -v=6
- JSONPath expressions returning empty results:
## First, check the raw JSON structure to understand the path
kubectl get pods -n kubectl-lab -o json | jq '.' | head -50
## Verify the field exists at the expected path
kubectl get pods -n kubectl-lab -o json | jq '.items[0].status.phase'
- metrics-server not available (kubectl top fails):
## Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system 2>/dev/null || \
echo "metrics-server is not installed"
## Install metrics-server (for lab/dev clusters)
## kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
- kubectl debug fails (ephemeral containers not supported):
Ephemeral containers require Kubernetes 1.25+. Check your cluster version:
Next Steps¶
- Practice these commands daily until they become muscle memory. The best way to learn kubectl is to use it constantly.
- Explore the kubectl reference documentation for commands not covered in this lab.
- Try writing shell scripts that combine multiple kubectl commands for real operational tasks (e.g., automated health checks, resource reports, cleanup scripts).
- Install and explore additional krew plugins:
stern(multi-pod log tailing),sniff(packet capture),resource-capacity(node capacity planning),ktop(terminal-based resource dashboard). - Learn about Kubernetes client libraries (client-go, Python kubernetes client) for programmatic access to the same API that kubectl uses.
- Study the Kubernetes API conventions to understand why the API works the way it does.
- Explore kubectl plugins development to build your own custom commands for your organization.
Tasks
KubernetesLabs Tasks¶
β’ Welcome to the KubernetesLabs Tasks section. β’ Each folder below contains a hands-on task/exercise that you can complete independently to practice specific Kubernetes skills. β’ Follow the README file in each task for detailed instructions and solutions.
Task Index¶
Core Kubernetes¶
| Kubernetes CLI Tasks | Collection of comprehensive Kubernetes exercises covering CLI commands, pod debugging, deployments, services, configmaps, secrets, and more. |
| Kubernetes Service Tasks | Exercises for Services, Networking, and Service Discovery |
| Kubernetes Scheduling Tasks | Node Affinity, Pod Affinity, Anti-Affinity, Taints, Tolerations, and Topology Spread Constraints |
Tools & Ecosystem¶
| Kubernetes Helm Tasks | Helm chart creation, packaging, templating, repositories, and deployment best practices |
| Kubernetes ArgoCD Tasks | ArgoCD installation, CLI usage, application deployment, GitOps workflows, App of Apps, sync waves, and fleet management |
| Kubernetes Kubebuilder Tasks | Kubebuilder operator development, CRD creation, reconciliation loops, webhooks, and testing |
| Kubernetes KEDA Tasks | KEDA event-driven autoscaling, ScaledObjects, ScaledJobs, TriggerAuthentication, and scaling patterns |
| Kubernetes Harbor + ArgoCD Airgap Tasks | Harbor registry, Nginx Ingress, fully offline/airgap ArgoCD deployment, image mirroring, Helm chart GitOps, and end-to-end pipeline |
Happy learning and hacking with Kubernetes!
Kubernetes CLI Tasks¶
- Hands-on Kubernetes exercises covering essential CLI commands, debugging techniques, and advanced orchestration concepts.
- Each task includes a description and a detailed solution with step-by-step instructions.
- Practice these tasks to master Kubernetes from basic operations to advanced deployment scenarios.
Table of Contents¶
- 01. Kubernetes Pod Workflow
- 02. Pod Debugging Challenge
- 03. Imperative to Declarative
- 04. Scaling Deployments
- 05. Rolling Updates and Rollbacks
- 06. ConfigMaps and Environment Variables
- 07. Secrets Management
- 08. Persistent Storage with PVCs
- 09. Multi-Container Pods
- 10. Jobs and CronJobs
- 11. Namespaces and Isolation
- 12. Resource Limits and Quotas
- 13. Liveness and Readiness Probes
- 14. Node Selection and Affinity
01. Kubernetes Pod Workflow¶
Start an nginx pod, verify it’s running, execute a command inside it to check the version, and then delete it.
Scenario:¶
β¦ As a developer, you need to quickly verify a container image or run a temporary workload without creating a full deployment. β¦ This workflow allows you to spin up pods, interact with them, and clean them up efficiently.
Hint: kubectl run, kubectl get, kubectl exec, kubectl delete
Solution
02. Pod Debugging Challenge¶
Run a pod that is destined to fail (e.g., using a non-existent image), inspect its status, find the error reason, and then fix it (by creating a correct one).
Scenario:¶
β¦ Your application pod is stuck in ImagePullBackOff or CrashLoopBackOff.
β¦ You need to diagnose the issue using Kubernetes inspection tools to understand why it’s failing.
Hint: kubectl run, kubectl get, kubectl describe, kubectl logs
Solution
# 1. Run a pod with a wrong image
kubectl run bad-pod --image=nginx:wrongtag
# 2. Check status (should show ErrImagePull or ImagePullBackOff)
kubectl get pods
# 3. Describe the pod to see events
kubectl describe pod bad-pod
# 4. Delete the bad pod
kubectl delete pod bad-pod
# 5. Run a correct pod
kubectl run good-pod --image=nginx:alpine
03. Imperative to Declarative¶
Create a pod using an imperative command, export its configuration to a YAML file, delete the pod, and recreate it using the YAML file.
Scenario:¶
β¦ You want to move from ad-hoc CLI commands to Infrastructure as Code (IaC). β¦ Generating YAML from existing resources or dry-runs is a quick way to scaffold your manifests.
Hint: kubectl run --dry-run=client -o yaml
Solution
04. Scaling Deployments¶
Create a deployment with 2 replicas, verify them, and then scale it up to 5 replicas.
Scenario:¶
β¦ Your application is receiving high traffic and you need to increase capacity. β¦ Kubernetes Deployments make scaling stateless applications trivial.
Hint: kubectl create deployment, kubectl scale
Solution
05. Rolling Updates and Rollbacks¶
Update the image of a deployment to a new version, watch the rollout status, and then rollback to the previous version.
Scenario:¶
β¦ You deployed a new version of your app, but it has a bug. β¦ You need to quickly revert to the last stable version without downtime.
Hint: kubectl set image, kubectl rollout status, kubectl rollout undo
Solution
# 1. Create deployment with nginx:1.21
kubectl create deployment web-app --image=nginx:1.21 --replicas=3
# 2. Update image to nginx:1.22
kubectl set image deployment/web-app nginx=nginx:1.22
# 3. Watch rollout
kubectl rollout status deployment/web-app
# 4. Rollback to previous version
kubectl rollout undo deployment/web-app
# Cleanup
kubectl delete deployment web-app
06. ConfigMaps and Environment Variables¶
Create a ConfigMap with some data and inject it into a pod as environment variables.
Scenario:¶
β¦ You need to configure your application (e.g., DB host, API URL) without hardcoding values in the image. β¦ ConfigMaps decouple configuration artifacts from image content.
Hint: kubectl create configmap, envFrom in YAML
Solution
# 1. Create a ConfigMap
kubectl create configmap app-config --from-literal=APP_COLOR=blue --from-literal=APP_MODE=prod
# 2. Create a pod that uses it (using dry-run to generate yaml first is easier)
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: config-pod
spec:
containers:
- name: test-container
image: busybox
command: [ "sh", "-c", "env" ]
envFrom:
- configMapRef:
name: app-config
EOF
# 3. Check logs to see env vars
kubectl logs config-pod | grep APP_
# Cleanup
kubectl delete pod config-pod
kubectl delete cm app-config
07. Secrets Management¶
Create a Secret and mount it as a volume in a pod.
Scenario:¶
β¦ Your application needs sensitive data like passwords or API keys. β¦ Secrets store this data securely and can be mounted as files or env vars.
Hint: kubectl create secret, volumeMounts
Solution
# 1. Create a generic secret
kubectl create secret generic my-secret --from-literal=password=s3cr3t
# 2. Create a pod mounting the secret
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: secret-pod
spec:
containers:
- name: busybox
image: busybox
command: ["sleep", "3600"]
volumeMounts:
- name: secret-volume
mountPath: "/etc/secret-volume"
readOnly: true
volumes:
- name: secret-volume
secret:
secretName: my-secret
EOF
# 3. Verify secret file exists
kubectl exec secret-pod -- cat /etc/secret-volume/password
# Cleanup
kubectl delete pod secret-pod
kubectl delete secret my-secret
08. Persistent Storage with PVCs¶
Create a PersistentVolumeClaim (PVC) and mount it to a pod to persist data.
Scenario:¶
β¦ You are running a database or stateful app that needs to save data even if the pod restarts. β¦ PVCs request storage from the cluster’s storage provisioner.
Hint: PersistentVolumeClaim, volumes
Solution
# 1. Create a PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
EOF
# 2. Create a pod using the PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: pvc-pod
spec:
containers:
- name: busybox
image: busybox
command: ["sleep", "3600"]
volumeMounts:
- mountPath: "/data"
name: my-storage
volumes:
- name: my-storage
persistentVolumeClaim:
claimName: my-pvc
EOF
# 3. Write data
kubectl exec pvc-pod -- sh -c "echo 'Hello Storage' > /data/test.txt"
# 4. Delete pod and recreate (data should persist - exercise for reader)
kubectl delete pod pvc-pod
# Re-apply pod yaml and check file
# Cleanup
kubectl delete pvc my-pvc
09. Multi-Container Pods¶
Create a pod with two containers: a main application and a sidecar helper.
Scenario:¶
β¦ You need a helper process (like a log shipper or proxy) to run alongside your main application in the same network namespace. β¦ Multi-container pods share storage and network.
Hint: containers array in Pod spec
Solution
# 1. Create multi-container pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: multi-container-pod
spec:
containers:
- name: main-app
image: busybox
command: ["sh", "-c", "while true; do echo 'Main App' > /shared/index.html; sleep 5; done"]
volumeMounts:
- name: shared-data
mountPath: /shared
- name: sidecar
image: busybox
command: ["sh", "-c", "while true; do cat /shared/index.html; sleep 5; done"]
volumeMounts:
- name: shared-data
mountPath: /shared
volumes:
- name: shared-data
emptyDir: {}
EOF
# 2. Check logs of sidecar
kubectl logs multi-container-pod -c sidecar
# Cleanup
kubectl delete pod multi-container-pod
10. Jobs and CronJobs¶
Create a Job that runs to completion, and a CronJob that runs every minute.
Scenario:¶
β¦ You have a batch process (database migration, report generation) or a periodic task. β¦ Jobs ensure a task finishes successfully; CronJobs schedule them.
Hint: kubectl create job, kubectl create cronjob
Solution
# 1. Create a Job
kubectl create job my-job --image=busybox -- echo "Job Completed"
# 2. Check job status
kubectl get jobs
kubectl logs job/my-job
# 3. Create a CronJob
kubectl create cronjob my-cron --image=busybox --schedule="*/1 * * * *" -- echo "Cron Run"
# 4. Wait for a run and check jobs created by cron
kubectl get jobs --watch
# Cleanup
kubectl delete job my-job
kubectl delete cronjob my-cron
11. Namespaces and Isolation¶
Create a new namespace and run a pod inside it.
Scenario:¶
β¦ You want to separate development resources from production. β¦ Namespaces provide a scope for names and can be used to divide cluster resources.
Hint: kubectl create namespace, kubectl run -n
Solution
12. Resource Limits and Quotas¶
Create a pod with CPU and Memory requests and limits.
Scenario:¶
β¦ You need to ensure fair resource usage and prevent one container from starving others. β¦ Requests guarantee resources; limits cap them.
Hint: resources.requests, resources.limits
Solution
# 1. Create pod with limits
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: resource-pod
spec:
containers:
- name: nginx
image: nginx:alpine
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
EOF
# 2. Describe to see limits
kubectl describe pod resource-pod
# Cleanup
kubectl delete pod resource-pod
13. Liveness and Readiness Probes¶
Add a liveness probe to a pod to restart it if it freezes, and a readiness probe to control traffic flow.
Scenario:¶
β¦ Your app might deadlock or take time to start up. β¦ Liveness probes restart unhealthy pods; Readiness probes remove them from Service endpoints until ready.
Hint: livenessProbe, readinessProbe
Solution
# 1. Create pod with probes
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: probe-pod
spec:
containers:
- name: nginx
image: nginx:alpine
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 3
periodSeconds: 3
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
EOF
# 2. Describe to see probe status
kubectl describe pod probe-pod
# Cleanup
kubectl delete pod probe-pod
14. Node Selection and Affinity¶
Schedule a pod on a specific node using a node selector (requires a node label).
Scenario:¶
β¦ You have specialized hardware (GPU, SSD) on specific nodes. β¦ You need to ensure your pod lands on the correct node.
Hint: kubectl label nodes, nodeSelector
Solution
# 1. Label a node (use your node name, e.g., minikube or docker-desktop)
# Get node name
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl label node $NODE_NAME disk=ssd
# 2. Create pod with nodeSelector
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: ssd-pod
spec:
containers:
- name: nginx
image: nginx:alpine
nodeSelector:
disk: ssd
EOF
# 3. Verify it's running
kubectl get pod ssd-pod -o wide
# Cleanup
kubectl delete pod ssd-pod
kubectl label node $NODE_NAME disk-
Kubernetes Service Tasks¶
- Hands-on Kubernetes exercises covering Services, Networking, and Service Discovery.
- Each task includes a description and a detailed solution with step-by-step instructions.
- Practice these tasks to master how Kubernetes exposes applications and manages traffic.
Table of Contents¶
- 01. Basic Service Exposure (ClusterIP)
- 02. NodePort & LoadBalancer
- 03. Service Discovery with DNS (FQDN)
- 04. Headless Services
- 05. ExternalName Service
- 06. Manual Endpoints
- 07. Session Affinity
- 08. Multi-Port Service
01. Basic Service Exposure (ClusterIP)¶
Run an nginx pod and expose it via a Service (ClusterIP) to access it from within the cluster.
Scenario:¶
β¦ You have an application running in a pod, but it needs to be accessible by other pods. β¦ ClusterIP is the default service type, providing internal stable IP.
Hint: kubectl expose, kubectl get services, kubectl port-forward
Solution
# 1. Run an nginx pod
kubectl run nginx-web --image=nginx:alpine --port=80
# 2. Expose the pod as a Service (ClusterIP by default)
kubectl expose pod nginx-web --name=nginx-svc --port=80 --target-port=80
# 3. Verify the service
kubectl get svc
# 4. Access it (using port-forward for local access)
kubectl port-forward svc/nginx-svc 8080:80
# (Open localhost:8080 in browser)
# Cleanup
kubectl delete pod nginx-web
kubectl delete svc nginx-svc
02. NodePort & LoadBalancer¶
Expose a deployment using NodePort to access it via the node’s IP, and then switch it to LoadBalancer (simulated or real).
Scenario:¶
β¦ You need to make your application accessible from outside the Kubernetes cluster.
β¦ NodePort opens a specific port on all nodes, while LoadBalancer provisions an external IP (cloud provider dependent).
Hint: type: NodePort, type: LoadBalancer
Solution
# 1. Create a deployment
kubectl create deployment web-server --image=nginx:alpine --replicas=2
# 2. Expose as NodePort
kubectl expose deployment web-server --type=NodePort --name=web-nodeport --port=80
# 3. Get the allocated NodePort (e.g., 30xxx)
kubectl get svc web-nodeport
# 4. (Optional) Patch it to be a LoadBalancer
kubectl patch svc web-nodeport -p '{"spec": {"type": "LoadBalancer"}}'
# 5. Verify external IP (it might stay <pending> on Minikube/Kind without addons)
kubectl get svc web-nodeport
# Cleanup
kubectl delete deployment web-server
kubectl delete svc web-nodeport
03. Service Discovery with DNS (FQDN)¶
Create two pods in different namespaces and verify they can communicate using the Fully Qualified Domain Name (FQDN).
Scenario:¶
β¦ Microservices often live in different namespaces (e.g., frontend vs backend).
β¦ You need to ensure they can talk to each other using Kubernetes internal DNS.
Hint: nslookup, <service>.<namespace>.svc.cluster.local
Solution
# 1. Create two namespaces
kubectl create ns app-a
kubectl create ns app-b
# 2. Run a target pod and service in app-b
kubectl run backend --image=nginx:alpine -n app-b
kubectl expose pod backend --name=backend-svc --port=80 -n app-b
# 3. Run a client pod in app-a
kubectl run client --image=busybox -n app-a -- sleep 3600
# 4. Test DNS resolution from client to backend
# FQDN format: service-name.namespace.svc.cluster.local
kubectl exec -it client -n app-a -- nslookup backend-svc.app-b.svc.cluster.local
# 5. Test connectivity
kubectl exec -it client -n app-a -- wget -O- backend-svc.app-b.svc.cluster.local
# Cleanup
kubectl delete ns app-a app-b
04. Headless Services¶
Create a Headless Service (ClusterIP: None) and verify that DNS returns the IPs of the individual pods instead of a single Service IP.
Scenario:¶
β¦ You are deploying a distributed stateful application (like Cassandra, MongoDB, or Kafka) that needs to discover all peer nodes directly. β¦ Headless services allow direct pod-to-pod communication without load balancing.
Hint: clusterIP: None
Solution
# 1. Create a Headless Service
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: headless-svc
spec:
clusterIP: None
selector:
app: headless-app
ports:
- port: 80
EOF
# 2. Create pods matching the selector
kubectl run pod-1 --image=nginx:alpine --labels=app=headless-app
kubectl run pod-2 --image=nginx:alpine --labels=app=headless-app
# 3. Verify DNS resolution (should return multiple IPs)
kubectl run dns-test --image=busybox --restart=Never -- nslookup headless-svc
# 4. Check logs to see the IPs
kubectl logs dns-test
# Cleanup
kubectl delete pod pod-1 pod-2 dns-test
kubectl delete svc headless-svc
05. ExternalName Service¶
Create a Service that maps to an external DNS name (e.g., google.com) instead of a pod selector.
Scenario:¶
β¦ You want to refer to an external database or API (e.g., AWS RDS, external API) using a local Kubernetes service name. β¦ This allows you to change the external endpoint later without changing your application code.
Hint: type: ExternalName, externalName: example.com
Solution
# 1. Create ExternalName service
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: my-external-service
spec:
type: ExternalName
externalName: google.com
EOF
# 2. Test resolution (it should be a CNAME to google.com)
kubectl run ext-test --image=busybox --restart=Never -- nslookup my-external-service
# 3. Check logs
kubectl logs ext-test
# Cleanup
kubectl delete svc my-external-service
kubectl delete pod ext-test
06. Manual Endpoints¶
Create a Service without a selector, and manually create an Endpoints object to point to an external IP (or a specific pod IP).
Scenario:¶
β¦ You want to use a Kubernetes Service to point to a specific IP address that isn’t managed by a Kubernetes Pod selector (e.g., a legacy server or a database outside the cluster).
Hint: kind: Endpoints, same name as Service
Solution
# 1. Create a Service without a selector
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: manual-svc
spec:
ports:
- port: 80
targetPort: 80
EOF
# 2. Create Endpoints manually (Use an IP you know, e.g., 1.1.1.1 or a pod IP)
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Endpoints
metadata:
name: manual-svc
subsets:
- addresses:
- ip: 1.1.1.1
ports:
- port: 80
EOF
# 3. Describe service to see endpoints
kubectl describe svc manual-svc
# Cleanup
kubectl delete svc manual-svc
kubectl delete endpoints manual-svc
07. Session Affinity¶
Create a Service with sessionAffinity: ClientIP and verify that requests from the same client pod go to the same backend pod (if possible to observe).
Scenario:¶
β¦ Your application stores session state locally in the container (not recommended, but happens). β¦ You need to ensure a user always hits the same pod during their session.
Hint: sessionAffinity: ClientIP
Solution
# 1. Create a deployment with 3 replicas
kubectl create deployment session-app --image=nginx:alpine --replicas=3
# 2. Expose with ClientIP affinity
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: session-svc
spec:
selector:
app: session-app
ports:
- port: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
EOF
# 3. Verify affinity setting
kubectl describe svc session-svc
# Cleanup
kubectl delete deployment session-app
kubectl delete svc session-svc
08. Multi-Port Service¶
Create a Service that exposes both port 80 (HTTP) and 443 (HTTPS) for the same set of pods.
Scenario:¶
β¦ Your application serves both HTTP and HTTPS traffic. β¦ You need a single Service to handle both ports.
Hint: ports array in Service spec
Solution
# 1. Create a pod that exposes port 80
kubectl run web-multi --image=nginx:alpine --port=80
# 2. Create a multi-port service
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: multi-port-svc
spec:
selector:
run: web-multi
ports:
- name: http
port: 80
targetPort: 80
- name: https
port: 443
targetPort: 80 # Mapping 443 to 80 just for demo since nginx listens on 80
EOF
# 3. Verify ports
kubectl get svc multi-port-svc
# Cleanup
kubectl delete pod web-multi
kubectl delete svc multi-port-svc
Kubernetes Helm Chart Tasks¶
- Hands-on Kubernetes exercises covering Helm chart creation, packaging, deployment, and best practices.
- Each task includes a description, scenario, and a detailed solution with step-by-step instructions.
- Practice these tasks to master Helm from basic chart scaffolding to advanced templating and chart repositories.
Table of Contents¶
- 01. Scaffold a Helm Chart
- 02. Explore the Chart Structure
- 03. Deploy an Nginx-Based Chart
- 04. Customize the Welcome Page with Current Date & Time
- 05. Add a Service
- 06. Add Two Ingress Resources with Different Paths
- 07. Add an ExternalName Service
- 08. Values Overrides and Environments
- 09. Template Helpers and Named Templates
- 10. Template Control Flow (if / range / with)
- 11. Chart Dependencies (Subcharts)
- 12. Linting, Dry-Run, and Debugging
- 13. Package and Host a Chart Repository
- 14. Upgrade, Rollback, and Release History
- 15. Hooks (Pre-install / Post-install)
- 16. Use
helm statusto Inspect a Release - 17. Extract Values with
helm get values - 18. Show Chart Values with
helm show values - 19. Search Charts with
helm search repo - 20. Update Repositories with
helm repo update - 21. Run Chart Tests with
helm test - 22. Use
helm get allto Retrieve Complete Release Info - 23. Use
helm listwith Filters and Formatting - 24. Chain Multiple Commands for Release Management
01. Scaffold a Helm Chart¶
Create a new Helm chart from scratch using helm create and explore the generated files.
Scenario:¶
β¦ You need to package an application for Kubernetes and want a standardized project structure.
β¦ helm create generates a best-practice skeleton you can customize.
Hint: helm create, tree
Solution
# 1. Create a new chart named "my-nginx-app"
helm create my-nginx-app
# 2. Explore the generated structure
tree my-nginx-app/
# Output:
# my-nginx-app/
# βββ Chart.yaml # Chart metadata (name, version, description)
# βββ values.yaml # Default configuration values
# βββ charts/ # Dependency charts (subcharts)
# βββ templates/ # Kubernetes manifest templates
# β βββ NOTES.txt # Post-install usage notes
# β βββ _helpers.tpl # Named template definitions
# β βββ deployment.yaml
# β βββ hpa.yaml
# β βββ ingress.yaml
# β βββ service.yaml
# β βββ serviceaccount.yaml
# β βββ tests/
# β βββ test-connection.yaml
# βββ .helmignore # Files to exclude when packaging
02. Explore the Chart Structure¶
Inspect Chart.yaml and values.yaml to understand how Helm charts are configured.
Scenario:¶
β¦ Before modifying a chart, you need to understand what each file does.
β¦ Chart.yaml defines the chart identity; values.yaml drives all the template rendering.
Hint: cat Chart.yaml, cat values.yaml
Solution
# 1. Inspect Chart.yaml
cat my-nginx-app/Chart.yaml
# Key fields:
# - apiVersion: v2 (Helm 3 chart)
# - name: my-nginx-app
# - version: 0.1.0 (chart version - bump this on changes)
# - appVersion: "1.16.0" (the app version being deployed)
# 2. Inspect values.yaml
cat my-nginx-app/values.yaml
# Key fields:
# - replicaCount: 1
# - image.repository: nginx
# - image.tag: "" (defaults to appVersion from Chart.yaml)
# - service.type: ClusterIP
# - service.port: 80
# - ingress.enabled: false
# 3. See how values are consumed in templates
grep -n '{{ .Values' my-nginx-app/templates/deployment.yaml
# 4. Render the templates without deploying (dry-run)
helm template my-release my-nginx-app/
03. Deploy an Nginx-Based Chart¶
Install the chart to your cluster using the default nginx image, verify the deployment, and access nginx.
Scenario:¶
β¦ You want to deploy a basic nginx web server using Helm to validate the chart works before customizing it.
Hint: helm install, kubectl get all, kubectl port-forward
Solution
# 1. Install the chart
helm install my-nginx my-nginx-app/
# 2. Verify all resources were created
kubectl get all -l app.kubernetes.io/instance=my-nginx
# 3. Check the release
helm list
# 4. Access the application via port-forward
kubectl port-forward svc/my-nginx-my-nginx-app 8080:80
# 5. In another terminal or browser
curl http://localhost:8080
# Should show the default nginx welcome page
# 6. Uninstall when done
helm uninstall my-nginx
04. Customize the Welcome Page with Current Date & Time¶
Create a ConfigMap that generates a custom HTML welcome page showing the current date and time, and mount it into the nginx container.
Scenario:¶
β¦ You want to display dynamic content (deployment timestamp) on the nginx welcome page. β¦ This demonstrates how Helm templates can inject build-time values into application configuration.
Hint: now, date, ConfigMap volume mount, {{ .Release }}
Solution
Step 1: Create the ConfigMap template [templates/configmap-html.yaml]¶
cat > my-nginx-app/templates/configmap-html.yaml << 'TEMPLATE'
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "my-nginx-app.fullname" . }}-html
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
data:
index.html: |
<!DOCTYPE html>
<html>
<head>
<title>{{ .Values.welcomePage.title | default "Welcome" }}</title>
<style>
body {
font-family: Arial, sans-serif;
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
margin: 0;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
}
.container { text-align: center; }
h1 { font-size: 2.5em; }
.info { font-size: 1.2em; margin: 10px 0; }
.time { font-size: 3em; font-weight: bold; margin: 20px 0; }
</style>
</head>
<body>
<div class="container">
<h1>{{ .Values.welcomePage.title | default "Welcome to My Nginx App" }}</h1>
<p class="info">Release: <strong>{{ .Release.Name }}</strong></p>
<p class="info">Namespace: <strong>{{ .Release.Namespace }}</strong></p>
<p class="info">Chart Version: <strong>{{ .Chart.Version }}</strong></p>
<p class="info">App Version: <strong>{{ .Chart.AppVersion }}</strong></p>
<p class="time">Deployed at: {{ now | date "2006-01-02 15:04:05 MST" }}</p>
{{- if .Values.welcomePage.message }}
<p class="info">{{ .Values.welcomePage.message }}</p>
{{- end }}
</div>
</body>
</html>
TEMPLATE
Step 2: Add welcome page values to values.yaml¶
cat >> my-nginx-app/values.yaml << 'EOF'
# Custom welcome page configuration
welcomePage:
title: "Welcome to My Nginx App"
message: "Deployed with Helm!"
EOF
Step 3: Update the deployment template to mount the ConfigMap¶
Edit my-nginx-app/templates/deployment.yaml - add the volumeMounts and volumes:
# Add volumeMounts under the container spec and volumes under the pod spec.
# The easiest approach: replace the deployment template entirely.
# Here we patch the key sections:
# In the container spec, add:
# volumeMounts:
# - name: html-volume
# mountPath: /usr/share/nginx/html
# readOnly: true
# In the pod spec (same level as containers), add:
# volumes:
# - name: html-volume
# configMap:
# name: {{ include "my-nginx-app.fullname" . }}-html
For a quick approach, replace the entire deployment template:
cat > my-nginx-app/templates/deployment.yaml << 'TEMPLATE'
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-nginx-app.fullname" . }}
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "my-nginx-app.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
# Force pod restart on ConfigMap changes
checksum/html: {{ include (print $.Template.BasePath "/configmap-html.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "my-nginx-app.labels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "my-nginx-app.serviceAccountName" . }}
{{- with .Values.podSecurityContext }}
securityContext:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
{{- with .Values.securityContext }}
securityContext:
{{- toYaml . | nindent 12 }}
{{- end }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: {{ .Values.service.port }}
protocol: TCP
livenessProbe:
{{- toYaml .Values.livenessProbe | nindent 12 }}
readinessProbe:
{{- toYaml .Values.readinessProbe | nindent 12 }}
{{- with .Values.resources }}
resources:
{{- toYaml . | nindent 12 }}
{{- end }}
volumeMounts:
- name: html-volume
mountPath: /usr/share/nginx/html
readOnly: true
{{- with .Values.volumeMounts }}
{{- toYaml . | nindent 12 }}
{{- end }}
volumes:
- name: html-volume
configMap:
name: {{ include "my-nginx-app.fullname" . }}-html
{{- with .Values.volumes }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
TEMPLATE
Step 4: Install and verify¶
# Install (or upgrade if already installed)
helm upgrade --install my-nginx my-nginx-app/
# Port-forward and check the custom page
kubectl port-forward svc/my-nginx-my-nginx-app 8080:80
# In another terminal
curl http://localhost:8080
# Should show the custom HTML page with current date/time and release info
05. Add a Service¶
Verify the Service template supports ClusterIP, NodePort, and LoadBalancer types via values.yaml.
Scenario:¶
β¦ You want to expose your application with different service types depending on the environment (e.g., ClusterIP for dev, NodePort for minikube, LoadBalancer for cloud).
β¦ You need your chart to be flexible enough to deploy with different service types depending on the environment.
β¦ The default helm create already includes a service template - you need to understand and test it.
Hint: --set service.type=NodePort, helm upgrade
Solution
# 1. Check the current service template
cat my-nginx-app/templates/service.yaml
# 2. The default template already supports configurable type:
# type: {{ .Values.service.type }}
# port: {{ .Values.service.port }}
# 3. Install with ClusterIP (default)
helm upgrade --install my-nginx my-nginx-app/
# 4. Verify
kubectl get svc -l app.kubernetes.io/instance=my-nginx
# TYPE should be ClusterIP
# 5. Upgrade to NodePort
helm upgrade my-nginx my-nginx-app/ --set service.type=NodePort
# 6. Verify
kubectl get svc -l app.kubernetes.io/instance=my-nginx
# TYPE should now be NodePort
# 7. Upgrade to LoadBalancer
helm upgrade my-nginx my-nginx-app/ --set service.type=LoadBalancer
# 8. Verify
kubectl get svc -l app.kubernetes.io/instance=my-nginx
# TYPE should now be LoadBalancer (EXTERNAL-IP may stay <pending> on local clusters)
06. Add Two Ingress Resources with Different Paths¶
Create two Ingress resources: one serving the main app at / and another serving a health/status endpoint at /status.
Scenario:¶
β¦ Your application has a main frontend and a separate status/health page. β¦ You want to route traffic using different URL paths to the same backend, each with its own Ingress resource. β¦ This is useful when different Ingress resources need different annotations (rate limiting, auth, etc.).
Prerequisites: An Ingress controller must be installed (e.g., nginx-ingress).
Hint: Two separate Ingress templates, pathType: Prefix
Solution
Step 1: Create the main Ingress template [templates/ingress-main.yaml]
cat > my-nginx-app/templates/ingress-main.yaml << 'TEMPLATE'
{{- if .Values.ingress.main.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "my-nginx-app.fullname" . }}-main
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
{{- with .Values.ingress.main.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.main.className }}
ingressClassName: {{ .Values.ingress.main.className }}
{{- end }}
rules:
- host: {{ .Values.ingress.main.host | quote }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ include "my-nginx-app.fullname" . }}
port:
number: {{ .Values.service.port }}
{{- if .Values.ingress.main.tls }}
tls:
{{- toYaml .Values.ingress.main.tls | nindent 4 }}
{{- end }}
{{- end }}
TEMPLATE
Step 2: Create the status Ingress template [templates/ingress-status.yaml]
cat > my-nginx-app/templates/ingress-status.yaml << 'TEMPLATE'
{{- if .Values.ingress.status.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "my-nginx-app.fullname" . }}-status
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
{{- with .Values.ingress.status.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.status.className }}
ingressClassName: {{ .Values.ingress.status.className }}
{{- end }}
rules:
- host: {{ .Values.ingress.status.host | quote }}
http:
paths:
- path: /status
pathType: Prefix
backend:
service:
name: {{ include "my-nginx-app.fullname" . }}
port:
number: {{ .Values.service.port }}
{{- if .Values.ingress.status.tls }}
tls:
{{- toYaml .Values.ingress.status.tls | nindent 4 }}
{{- end }}
{{- end }}
TEMPLATE
Step 3: Remove the default ingress template (optional) and update values.yaml
# Remove the default generated ingress template to avoid confusion
rm my-nginx-app/templates/ingress.yaml
# Add ingress values to values.yaml
cat >> my-nginx-app/values.yaml << 'EOF'
# Ingress configuration - two separate Ingress resources
ingress:
main:
enabled: true
className: nginx
host: my-nginx.local
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
tls: []
status:
enabled: true
className: nginx
host: my-nginx.local
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
# Example: different rate limit for status endpoint
nginx.ingress.kubernetes.io/limit-rps: "10"
tls: []
EOF
Step 4: Deploy and verify
# Upgrade the release
helm upgrade --install my-nginx my-nginx-app/
# Verify both Ingress resources were created
kubectl get ingress -l app.kubernetes.io/instance=my-nginx
# Expected output:
# NAME CLASS HOSTS ADDRESS PORTS AGE
# my-nginx-my-nginx-app-main nginx my-nginx.local 80 5s
# my-nginx-my-nginx-app-status nginx my-nginx.local 80 5s
# Describe each to see the path rules
kubectl describe ingress my-nginx-my-nginx-app-main
kubectl describe ingress my-nginx-my-nginx-app-status
# Test (add host entry or use curl with Host header)
# curl -H "Host: my-nginx.local" http://<INGRESS_IP>/
# curl -H "Host: my-nginx.local" http://<INGRESS_IP>/status
07. Add an ExternalName Service¶
Add a second Service of type ExternalName that maps a Kubernetes service name to an external DNS name.
Scenario:¶
β¦ Your application needs to connect to an external API or database (e.g., an RDS instance, a SaaS endpoint). β¦ By using an ExternalName service, you can refer to it by a local name inside the cluster and change the target later without modifying application code.
Hint: type: ExternalName, externalName
Solution
Step 1: Create the ExternalName Service template [templates/service-external.yaml]
cat > my-nginx-app/templates/service-external.yaml << 'TEMPLATE'
{{- if .Values.externalService.enabled -}}
apiVersion: v1
kind: Service
metadata:
name: {{ include "my-nginx-app.fullname" . }}-external
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
spec:
type: ExternalName
externalName: {{ .Values.externalService.host | quote }}
{{- if .Values.externalService.ports }}
ports:
{{- toYaml .Values.externalService.ports | nindent 4 }}
{{- end }}
{{- end }}
TEMPLATE
Step 2: Add values to values.yaml
cat >> my-nginx-app/values.yaml << 'EOF'
# ExternalName service - maps a local name to an external DNS
externalService:
enabled: true
host: api.example.com
ports:
- port: 443
protocol: TCP
EOF
Step 3: Deploy and verify
# Upgrade the release
helm upgrade --install my-nginx my-nginx-app/
# Verify the ExternalName service
kubectl get svc -l app.kubernetes.io/instance=my-nginx
# Should show something like:
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
# my-nginx-my-nginx-app ClusterIP 10.x.x.x <none> 80/TCP
# my-nginx-my-nginx-app-external ExternalName <none> api.example.com 443/TCP
# Test DNS resolution from inside the cluster
kubectl run dns-check --image=busybox --restart=Never \
-- nslookup my-nginx-my-nginx-app-external
kubectl logs dns-check
# Should resolve to api.example.com
# Cleanup test pod
kubectl delete pod dns-check
08. Values Overrides and Environments¶
Use multiple values files to manage different environments (dev, staging, production).
Scenario:¶
β¦ You have one chart but need different configurations per environment (replica count, image tag, resource limits).
β¦ Helm supports layering multiple -f values files and --set overrides.
Hint: helm install -f, --set, multiple values files
Solution
# 1. Create a dev values file
cat > values-dev.yaml << 'EOF'
replicaCount: 1
image:
tag: "alpine"
resources:
limits:
cpu: 100m
memory: 128Mi
welcomePage:
title: "DEV Environment"
message: "This is the development instance"
EOF
# 2. Create a production values file
cat > values-prod.yaml << 'EOF'
replicaCount: 3
image:
tag: "stable"
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
welcomePage:
title: "PRODUCTION"
message: "Production instance - handle with care"
service:
type: LoadBalancer
EOF
# 3. Install with dev values
helm upgrade --install my-nginx-dev my-nginx-app/ \
-f values-dev.yaml \
--namespace dev --create-namespace
# 4. Install with prod values
helm upgrade --install my-nginx-prod my-nginx-app/ \
-f values-prod.yaml \
--namespace prod --create-namespace
# 5. Verify different configurations
kubectl get deployment -n dev -o wide
kubectl get deployment -n prod -o wide
# 6. Override a single value on top of a values file
helm upgrade my-nginx-dev my-nginx-app/ \
-f values-dev.yaml \
--set replicaCount=2 \
--namespace dev
# Cleanup
helm uninstall my-nginx-dev -n dev
helm uninstall my-nginx-prod -n prod
kubectl delete ns dev prod
09. Template Helpers and Named Templates¶
Create a custom named template in _helpers.tpl and use it across multiple templates.
Scenario:¶
β¦ You have repeated logic (e.g., generating labels, resource names) across templates.
β¦ Named templates (partials) in _helpers.tpl let you define reusable snippets.
Hint: define, include, _helpers.tpl
Solution
# 1. Inspect existing helpers
cat my-nginx-app/templates/_helpers.tpl
# You'll see templates like:
# {{- define "my-nginx-app.name" -}} β Chart name
# {{- define "my-nginx-app.fullname" -}} β Release-qualified name
# {{- define "my-nginx-app.labels" -}} β Standard labels
# {{- define "my-nginx-app.selectorLabels" -}} β Selector labels
# 2. Add a custom helper - e.g., environment label
cat >> my-nginx-app/templates/_helpers.tpl << 'EOF'
{{/*
Custom: Generate environment-specific annotations
*/}}
{{- define "my-nginx-app.envAnnotations" -}}
app.kubernetes.io/environment: {{ .Values.environment | default "dev" }}
app.kubernetes.io/team: {{ .Values.team | default "platform" }}
{{- end }}
EOF
# 3. Use it in a template (e.g., deployment.yaml metadata.annotations):
# annotations:
# {{- include "my-nginx-app.envAnnotations" . | nindent 4 }}
# 4. Add default values
cat >> my-nginx-app/values.yaml << 'EOF'
# Environment metadata
environment: dev
team: platform
EOF
# 5. Test rendering
helm template my-nginx my-nginx-app/ | grep -A2 "environment"
10. Template Control Flow (if / range / with)¶
Practice Helm template control structures: conditionals, loops, and scoping.
Scenario:¶
β¦ You need to conditionally render resources, iterate over lists, or scope into nested values. β¦ Go template control flow is essential for writing flexible Helm charts.
Hint: {{- if }}, {{- range }}, {{- with }}
Solution
# 1. Conditional: only create a resource if enabled
# Already used in ingress templates:
# {{- if .Values.ingress.main.enabled -}}
# ...
# {{- end }}
# 2. Range: iterate over a list
# Example: Add multiple environment variables from a values list
# In values.yaml:
cat >> my-nginx-app/values.yaml << 'EOF'
# Extra environment variables
extraEnv:
- name: LOG_LEVEL
value: "info"
- name: APP_MODE
value: "production"
EOF
# In deployment.yaml, under containers[].env:
# {{- range .Values.extraEnv }}
# - name: {{ .name }}
# value: {{ .value | quote }}
# {{- end }}
# 3. With: scope into a map
# {{- with .Values.nodeSelector }}
# nodeSelector:
# {{- toYaml . | nindent 8 }}
# {{- end }}
# 4. Test the rendering
helm template my-nginx my-nginx-app/ --set extraEnv[0].name=DEBUG,extraEnv[0].value=true
11. Chart Dependencies (Subcharts)¶
Add a dependency (e.g., Redis) as a subchart and configure it through the parent values.yaml.
Scenario:¶
β¦ Your application needs a Redis cache alongside nginx. β¦ Instead of writing Redis manifests from scratch, you depend on an existing chart from a repository.
Hint: Chart.yaml dependencies, helm dependency update
Solution
# 1. Add dependency to Chart.yaml
cat >> my-nginx-app/Chart.yaml << 'EOF'
dependencies:
- name: redis
version: "~18.0"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
EOF
# 2. Add redis configuration in values.yaml
cat >> my-nginx-app/values.yaml << 'EOF'
# Redis subchart configuration
redis:
enabled: false # Set to true to deploy Redis alongside nginx
architecture: standalone
auth:
enabled: false
EOF
# 3. Build dependencies (downloads the redis chart into charts/)
helm dependency update my-nginx-app/
# 4. Verify
ls my-nginx-app/charts/
# Should show: redis-18.x.x.tgz
# 5. Install with Redis enabled
helm upgrade --install my-nginx my-nginx-app/ --set redis.enabled=true
# 6. Verify Redis pods
kubectl get pods -l app.kubernetes.io/instance=my-nginx
# Cleanup
helm uninstall my-nginx
12. Linting, Dry-Run, and Debugging¶
Use Helm’s built-in tools to validate, debug, and troubleshoot your chart before deploying.
Scenario:¶
β¦ You modified several templates and want to catch errors before deploying to the cluster. β¦ Helm provides lint, template, dry-run, and debug tools for this purpose.
Hint: helm lint, helm template, --dry-run, --debug
Solution
# 1. Lint - checks for common errors and best practices
helm lint my-nginx-app/
# With values overrides
helm lint my-nginx-app/ -f values-dev.yaml
# 2. Template - render manifests locally (no cluster needed)
helm template my-release my-nginx-app/ > rendered.yaml
cat rendered.yaml
# 3. Dry-run - simulates install against the cluster (validates with API server)
helm install my-nginx my-nginx-app/ --dry-run
# 4. Dry-run + Debug - shows rendered templates AND computed values
helm install my-nginx my-nginx-app/ --dry-run --debug
# 5. Get rendered templates for a deployed release
helm get manifest my-nginx
# 6. Get the computed values for a deployed release
helm get values my-nginx
# 7. Get all information about a release
helm get all my-nginx
13. Package and Host a Chart Repository¶
Package the chart and host it in a Git-based chart repository using GitHub Pages.
Scenario:¶
β¦ You want to share your Helm chart with your team or the community.
β¦ A Helm chart repository is simply a web server hosting index.yaml and .tgz chart packages.
β¦ GitHub Pages is a free and easy way to host a chart repo.
Hint: helm package, helm repo index, GitHub Pages
Solution
# ββ Step 1: Package the chart ββ
helm package my-nginx-app/
# Output: my-nginx-app-0.1.0.tgz
# ββ Step 2: Create a chart repository on GitHub ββ
# Create a new GitHub repository (e.g., "helm-charts")
# Clone it locally:
git clone https://github.com/<your-username>/helm-charts.git
cd helm-charts
# Create a docs/ directory (GitHub Pages will serve from here)
mkdir -p docs
# Move the packaged chart
cp ../my-nginx-app-0.1.0.tgz docs/
# ββ Step 3: Generate the repository index ββ
helm repo index docs/ --url https://<your-username>.github.io/helm-charts/
# Verify the index
cat docs/index.yaml
# ββ Step 4: Push to GitHub ββ
git add .
git commit -m "Add my-nginx-app chart"
git push origin main
# ββ Step 5: Enable GitHub Pages ββ
# Go to: GitHub repo β Settings β Pages
# Set Source: Deploy from branch β main β /docs
# Save and wait for deployment
# ββ Step 6: Add the repo to Helm ββ
helm repo add my-charts https://<your-username>.github.io/helm-charts/
helm repo update
# Verify the chart is available
helm search repo my-charts
# ββ Step 7: Install from the repository ββ
helm install my-nginx my-charts/my-nginx-app
Alternative: Use OCI Registry (Helm 3.8+)
14. Upgrade, Rollback, and Release History¶
Upgrade a release with new values, inspect its history, and rollback to a previous revision.
Scenario:¶
β¦ You deployed version 1 of your chart, then upgraded to version 2 with bad configuration. β¦ You need to quickly rollback to the known-good state.
Hint: helm upgrade, helm history, helm rollback
Solution
# 1. Initial install (revision 1)
helm install my-nginx my-nginx-app/ \
--set welcomePage.title="Version 1"
# 2. Upgrade to revision 2 (change the title)
helm upgrade my-nginx my-nginx-app/ \
--set welcomePage.title="Version 2 - BROKEN"
# 3. Check release history
helm history my-nginx
# Output:
# REVISION STATUS DESCRIPTION
# 1 superseded Install complete
# 2 deployed Upgrade complete
# 4. Rollback to revision 1
helm rollback my-nginx 1
# 5. Verify history (now shows 3 revisions)
helm history my-nginx
# Output:
# REVISION STATUS DESCRIPTION
# 1 superseded Install complete
# 2 superseded Upgrade complete
# 3 deployed Rollback to 1
# 6. Verify the running app shows "Version 1" again
kubectl port-forward svc/my-nginx-my-nginx-app 8080:80
curl http://localhost:8080 | grep "Version"
# Cleanup
helm uninstall my-nginx
15. Hooks (Pre-install / Post-install)¶
Create Helm hooks that run a Job before and after chart installation.
Scenario:¶
β¦ You need to run a database migration before the app starts, or send a notification after deployment. β¦ Helm hooks let you run resources at specific points in the release lifecycle.
Hint: helm.sh/hook annotation, pre-install, post-install
Solution
Step 1: Create a pre-install hook [templates/pre-install-job.yaml]
cat > my-nginx-app/templates/pre-install-job.yaml << 'TEMPLATE'
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "my-nginx-app.fullname" . }}-pre-install
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: pre-install
image: busybox
command:
- sh
- -c
- |
echo "=== Pre-install hook ==="
echo "Running pre-flight checks..."
echo "Release: {{ .Release.Name }}"
echo "Namespace: {{ .Release.Namespace }}"
echo "Chart: {{ .Chart.Name }}-{{ .Chart.Version }}"
echo "Pre-install complete!"
TEMPLATE
Step 2: Create a post-install hook [templates/post-install-job.yaml]
cat > my-nginx-app/templates/post-install-job.yaml << 'TEMPLATE'
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "my-nginx-app.fullname" . }}-post-install
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: post-install
image: busybox
command:
- sh
- -c
- |
echo "=== Post-install hook ==="
echo "Deployment verified!"
echo "Release: {{ .Release.Name }}"
echo "Post-install complete!"
TEMPLATE
Step 3: Deploy and observe hooks
# Install and watch the hooks execute
helm install my-nginx my-nginx-app/
# Check jobs (hook jobs auto-delete on success due to hook-delete-policy)
kubectl get jobs
# If you want to see the logs, remove the hook-delete-policy temporarily
# and check:
kubectl logs job/my-nginx-my-nginx-app-pre-install
kubectl logs job/my-nginx-my-nginx-app-post-install
# Hook execution order:
# 1. pre-install Job runs
# 2. Chart resources are created (Deployment, Service, etc.)
# 3. post-install Job runs
# Cleanup
helm uninstall my-nginx
Diagram: Helm Chart Architecture¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Helm Chart: my-nginx-app β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Chart.yaml ββββ name, version, dependencies β
β β
β values.yaml βββ defaults βββ values-dev.yaml β
β βββ values-prod.yaml β
β βββ --set overrides β
β β
β templates/ ββ¬ββ deployment.yaml βββΊ Deployment β
β βββ service.yaml ββββββΊ Service (ClusterIP) β
β βββ service-external ββΊ Service (ExternalName) β
β βββ ingress-main ββββββΊ Ingress (path: /) β
β βββ ingress-status ββββΊ Ingress (path: /status) β
β βββ configmap-html ββββΊ ConfigMap (welcome page)β
β βββ pre-install-job βββΊ Hook (pre-install) β
β βββ post-install-job ββΊ Hook (post-install) β
β βββ _helpers.tpl ββββββΊ Named templates β
β βββ NOTES.txt βββββββββΊ Post-install message β
β β
β charts/ ββββββ redis (subchart dependency) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
helm package βββΊ my-nginx-app-0.1.0.tgz
β
GitHub Pages / OCI Registry
β
helm repo add / helm install
Quick Reference: Essential Helm Commands¶
| Command | Description |
|---|---|
helm create <name> |
Scaffold a new chart |
helm install <release> <chart> |
Install a chart |
helm upgrade <release> <chart> |
Upgrade a release |
helm upgrade --install |
Install or upgrade (idempotent) |
helm uninstall <release> |
Remove a release |
helm list |
List installed releases |
helm history <release> |
Show release revision history |
helm rollback <release> <rev> |
Rollback to a previous revision |
helm template <release> <chart> |
Render templates locally |
helm lint <chart> |
Check chart for errors |
helm package <chart> |
Package chart into .tgz |
helm repo index <dir> |
Generate repository index |
helm repo add <name> <url> |
Add a chart repository |
helm search repo <keyword> |
Search charts in added repos |
helm dependency update <chart> |
Download chart dependencies |
helm get values <release> |
Show computed values for a release |
helm get manifest <release> |
Show rendered manifests for a release |
16. Use helm status to Inspect a Release¶
Use helm status to view detailed information about a deployed release including resource status and NOTES.
Scenario:¶
β¦ A release was deployed by another team member and you need to understand its current state.
β¦ You want to see the NOTES.txt output again without reinstalling.
β¦ helm status provides a quick overview of the release deployment status and health.
Hint: helm status, --revision, -o yaml
Solution
# 1. Install a release first
helm install my-nginx my-nginx-app/
# 2. Get the status of the release
helm status my-nginx
# Output shows:
# - Last deployment time
# - Release status (deployed, failed, pending, etc.)
# - Deployed resources
# - NOTES.txt content
# 3. Get status in YAML format
helm status my-nginx -o yaml
# 4. Get status in JSON format
helm status my-nginx -o json
# 5. Get status of a specific revision
helm status my-nginx --revision 1
# 6. Check status from a specific namespace
helm status my-nginx --namespace production
# Cleanup
helm uninstall my-nginx
17. Extract Values with helm get values¶
Use helm get values to see what values were actually used for a deployed release.
Scenario:¶
β¦ You deployed a chart months ago with custom values and need to remember what overrides were applied. β¦ Multiple team members have upgraded the release and you want to know the current configuration. β¦ You need to replicate the same configuration in another environment.
Hint: helm get values, --all, --revision
Solution
# 1. Install with custom values
helm install my-nginx my-nginx-app/ \
--set replicaCount=3 \
--set welcomePage.title="Production App"
# 2. Get only the user-supplied values
helm get values my-nginx
# Output shows only the overrides:
# replicaCount: 3
# welcomePage:
# title: Production App
# 3. Get ALL values (including defaults from values.yaml)
helm get values my-nginx --all
# 4. Get values from a specific revision
helm upgrade my-nginx my-nginx-app/ --set replicaCount=5
helm get values my-nginx --revision 1
helm get values my-nginx --revision 2
# 5. Output as JSON
helm get values my-nginx -o json
# 6. Save values to file for reuse
helm get values my-nginx > my-nginx-values.yaml
# 7. Use saved values in another deployment
helm install my-nginx-copy my-nginx-app/ -f my-nginx-values.yaml
# Cleanup
helm uninstall my-nginx my-nginx-copy
18. Show Chart Values with helm show values¶
Use helm show values to inspect the default values of a chart before installing.
Scenario:¶
β¦ You want to install a third-party chart from a repository but need to understand what configuration options are available. β¦ You’re evaluating multiple charts and want to compare their configuration interfaces. β¦ You need to create a custom values file but want to start from the defaults.
Hint: helm show values, chart repositories
Solution
# 1. Show default values of a local chart
helm show values my-nginx-app/
# 2. Show values from a packaged chart
helm package my-nginx-app/
helm show values my-nginx-app-0.1.0.tgz
# 3. Add a public chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami
# 4. Show default values from a repository chart
helm show values bitnami/nginx
# 5. Show values at a specific chart version
helm show values bitnami/nginx --version 15.0.0
# 6. Save default values to file for customization
helm show values bitnami/nginx > nginx-defaults.yaml
# 7. Compare values between chart versions
helm show values bitnami/nginx --version 14.0.0 > nginx-v14-values.yaml
helm show values bitnami/nginx --version 15.0.0 > nginx-v15-values.yaml
diff nginx-v14-values.yaml nginx-v15-values.yaml
# 8. Show all chart information (Chart.yaml + README + values)
helm show all bitnami/nginx
helm show chart bitnami/nginx
helm show readme bitnami/nginx
19. Search Charts with helm search repo¶
Use helm search repo to find charts in added repositories.
Scenario:¶
β¦ You need to deploy PostgreSQL but don’t want to write manifests from scratch. β¦ You want to find and compare available charts for a specific technology. β¦ You need to discover what version of a chart is available.
Hint: helm search repo, --versions, --version
Solution
# 1. Add popular chart repositories
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add stable https://charts.helm.sh/stable
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
# 2. Update repository index
helm repo update
# 3. Search for charts by keyword
helm search repo nginx
# 4. Search showing all available versions
helm search repo nginx --versions
# 5. Search with version constraint
helm search repo nginx --version "~15.0"
# 6. Search for development/pre-release versions
helm search repo nginx --devel
# 7. Search with regex pattern
helm search repo 'nginx.*'
# 8. Search across all repositories with output formatting
helm search repo postgresql -o json
helm search repo postgresql -o yaml
# 9. Search and filter with grep
helm search repo database | grep -i postgres
# 10. List all charts from a specific repository
helm search repo bitnami/
# 11. Show detailed output
helm search repo nginx --max-col-width 0
# 12. Install a found chart
helm search repo bitnami/redis --versions | head -5
helm install my-redis bitnami/redis --version 18.0.0
helm uninstall my-redis
20. Update Repositories with helm repo update¶
Use helm repo update to refresh the local cache of chart information.
Scenario:¶
β¦ A new version of a chart was released but helm search doesn’t show it.
β¦ You haven’t updated your repository index in weeks and want the latest charts.
β¦ Similar to apt update or yum update, you need to sync the latest metadata.
Hint: helm repo update, helm repo list
Solution
# 1. List all configured repositories
helm repo list
# 2. Update all repositories
helm repo update
# Output shows each repository being refreshed:
# Hang tight while we grab the latest from your chart repositories...
# ...Successfully got an update from the "bitnami" chart repository
# ...Successfully got an update from the "stable" chart repository
# Update Complete.
# 3. Update a specific repository
helm repo update bitnami
# 4. Update multiple specific repositories
helm repo update bitnami stable
# 5. Force update even if repository fails
helm repo update --fail-on-repo-update-fail=false
# 6. Verify you can now see newer chart versions
helm search repo bitnami/nginx --versions | head -5
# 7. Typical workflow: update before searching or installing
helm repo update
helm search repo redis
helm install my-redis bitnami/redis
# Cleanup
helm uninstall my-redis
21. Run Chart Tests with helm test¶
Use helm test to run tests defined in the chart’s templates/tests/ directory.
Scenario:¶
β¦ You deployed a release and want to verify it’s actually working correctly. β¦ The chart includes test pods that validate connectivity, configuration, or functionality. β¦ You want to include release validation in your CI/CD pipeline.
Hint: helm test, templates/tests/, helm.sh/hook: test
Solution
# 1. Create a test template if not already present
cat > my-nginx-app/templates/tests/test-connection.yaml << 'TEMPLATE'
apiVersion: v1
kind: Pod
metadata:
name: {{ include "my-nginx-app.fullname" . }}-test-connection
labels:
{{- include "my-nginx-app.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": test
spec:
containers:
- name: wget
image: busybox
command: ['wget']
args: ['{{ include "my-nginx-app.fullname" . }}:{{ .Values.service.port }}']
restartPolicy: Never
TEMPLATE
# 2. Install the release
helm install my-nginx my-nginx-app/
# 3. Run the tests
helm test my-nginx
# Output shows:
# NAME: my-nginx
# ...
# Phase: Succeeded
# 4. Run tests with logs displayed
helm test my-nginx --logs
# 5. Run tests with timeout
helm test my-nginx --timeout 2m
# 6. View test pod logs manually
kubectl logs my-nginx-my-nginx-app-test-connection
# 7. Filter which tests to run (if multiple tests exist)
helm test my-nginx --filter name=test-connection
# 8. Clean up test pods after running (by default they remain)
kubectl delete pod -l 'helm.sh/hook=test'
# Or configure it in the test template with:
# "helm.sh/hook-delete-policy": "hook-succeeded,hook-failed"
# Cleanup
helm uninstall my-nginx
22. Use helm get all to Retrieve Complete Release Info¶
Use helm get all to retrieve all information about a deployed release in one command.
Scenario:¶
β¦ You need to debug why a release isn’t working correctly. β¦ You want to export the complete release configuration for documentation or backup. β¦ You need to see the rendered manifests, computed values, and hooks all together.
Hint: helm get all, helm get manifest, helm get hooks
Solution
# 1. Install a release
helm install my-nginx my-nginx-app/ \
--set replicaCount=2 \
--set welcomePage.title="Debug Test"
# 2. Get all information about the release
helm get all my-nginx
# Output includes:
# - Release metadata
# - User-supplied values
# - Computed values
# - Rendered Kubernetes manifests
# - Hooks
# - Notes
# 3. Get all info from specific revision
helm upgrade my-nginx my-nginx-app/ --set replicaCount=3
helm get all my-nginx --revision 1
helm get all my-nginx --revision 2
# 4. Get individual components
helm get manifest my-nginx # Just the rendered manifests
helm get values my-nginx # Just the user values
helm get hooks my-nginx # Just the hooks
helm get notes my-nginx # Just the NOTES.txt
# 5. Export to file for backup/documentation
helm get all my-nginx > my-nginx-release-backup.yaml
# 6. Use template to extract specific information
helm get all my-nginx --template '{{.Release.Manifest}}'
# 7. Compare two revisions
helm get all my-nginx --revision 1 > rev1.yaml
helm get all my-nginx --revision 2 > rev2.yaml
diff rev1.yaml rev2.yaml
# Cleanup
helm uninstall my-nginx
rm -f rev1.yaml rev2.yaml my-nginx-release-backup.yaml
23. Use helm list with Filters and Formatting¶
Master helm list with various filters and output formats to manage multiple releases.
Scenario:¶
β¦ You have dozens of releases across multiple namespaces and need to find specific ones. β¦ You want to script release management and need machine-readable output. β¦ You need to filter releases by status (deployed, failed, pending).
Hint: helm list, --all-namespaces, --filter, -o json
Solution
# 1. Install multiple releases for testing
helm install nginx-dev my-nginx-app/ --set replicaCount=1
helm install nginx-staging my-nginx-app/ --set replicaCount=2
helm install nginx-prod my-nginx-app/ --set replicaCount=3 --namespace prod --create-namespace
# 2. List all releases in current namespace
helm list
# 3. List releases across all namespaces
helm list --all-namespaces
# 4. List releases in specific namespace
helm list --namespace prod
# 5. Filter releases by name pattern
helm list --filter 'nginx-.*'
helm list --filter 'nginx-dev'
# 6. Show only deployed releases
helm list --deployed
# 7. Show all releases including uninstalled (with --keep-history)
helm list --all
# 8. Show failed releases
helm list --failed
# 9. Show pending releases
helm list --pending
# 10. Output as JSON (for scripting)
helm list -o json
# 11. Output as YAML
helm list -o yaml
# 12. Show extended information
helm list --all-namespaces -o wide
# 13. Limit number of results
helm list --max 5
# 14. Sort by date
helm list --date
# 15. Reverse sort order
helm list --reverse
# 16. Show specific columns only (use with jq for JSON output)
helm list -o json | jq '.[] | {name: .name, status: .status, namespace: .namespace}'
# 17. Count releases
helm list --all-namespaces | wc -l
# 18. Find releases using specific chart
helm list --all-namespaces -o json | jq '.[] | select(.chart | contains("my-nginx-app"))'
# Cleanup
helm uninstall nginx-dev nginx-staging
helm uninstall nginx-prod -n prod
kubectl delete namespace prod
24. Chain Multiple Commands for Release Management¶
Practice chaining Helm commands for common workflows and debugging scenarios.
Scenario:¶
β¦ You need to quickly deploy, verify, and troubleshoot releases in rapid iteration cycles. β¦ You want to create reusable scripts for release management. β¦ You need to validate deployments in CI/CD pipelines.
Hint: Combine install, status, get values, test, upgrade, rollback
Solution
# ββ Workflow 1: Install, verify, test ββ
helm install my-nginx my-nginx-app/ && \
helm status my-nginx && \
helm test my-nginx
# ββ Workflow 2: Dry-run, lint, then install ββ
helm lint my-nginx-app/ && \
helm install my-nginx my-nginx-app/ --dry-run --debug && \
helm install my-nginx my-nginx-app/
# ββ Workflow 3: Template, validate, install ββ
helm template my-nginx my-nginx-app/ | kubectl apply --dry-run=client -f - && \
helm install my-nginx my-nginx-app/
# ββ Workflow 4: Upgrade or install (idempotent) ββ
helm upgrade --install my-nginx my-nginx-app/ --wait --timeout 5m
# ββ Workflow 5: Upgrade with backup and rollback on failure ββ
helm get values my-nginx > backup-values.yaml && \
helm upgrade my-nginx my-nginx-app/ --set replicaCount=5 --atomic
# ββ Workflow 6: Install, check status, get all info ββ
helm install my-nginx my-nginx-app/ && \
sleep 10 && \
helm status my-nginx && \
helm get all my-nginx
# ββ Workflow 7: Compare before and after upgrade ββ
helm get values my-nginx > before.yaml && \
helm upgrade my-nginx my-nginx-app/ --set newKey=newValue && \
helm get values my-nginx > after.yaml && \
diff before.yaml after.yaml
# ββ Workflow 8: Install with custom values and verify ββ
cat > custom.yaml << EOF
replicaCount: 3
welcomePage:
title: "Production"
EOF
helm install my-nginx my-nginx-app/ -f custom.yaml && \
kubectl get pods -l app.kubernetes.io/instance=my-nginx
# ββ Workflow 9: Rollback if tests fail ββ
helm upgrade my-nginx my-nginx-app/ --set replicaCount=10 && \
helm test my-nginx || helm rollback my-nginx
# ββ Workflow 10: Clean reinstall ββ
helm uninstall my-nginx 2>/dev/null || true && \
helm install my-nginx my-nginx-app/ --wait
# ββ Workflow 11: Multi-environment deployment ββ
for env in dev staging prod; do
helm upgrade --install my-nginx-$env my-nginx-app/ \
-f values-$env.yaml \
--namespace $env --create-namespace
done
# List all deployments
helm list --all-namespaces
# ββ Workflow 12: Debugging failed release ββ
helm status my-nginx && \
helm get values my-nginx --all && \
helm get manifest my-nginx | kubectl apply --dry-run=client -f - && \
kubectl describe pods -l app.kubernetes.io/instance=my-nginx
# Cleanup all
for env in dev staging prod; do
helm uninstall my-nginx-$env -n $env 2>/dev/null || true
kubectl delete namespace $env 2>/dev/null || true
done
helm uninstall my-nginx 2>/dev/null || true
rm -f backup-values.yaml before.yaml after.yaml custom.yaml
Kubernetes ArgoCD Tasks¶
- Hands-on Kubernetes exercises covering ArgoCD installation, CLI usage, application deployment, GitOps workflows, and the App of Apps pattern.
- Each task includes a description, scenario, and a detailed solution with step-by-step instructions.
- Practice these tasks to master ArgoCD from initial installation to advanced multi-app orchestration.
Table of Contents¶
- 01. Install ArgoCD via Helm
- 02. Expose ArgoCD via Ingress
- 03. Login with the ArgoCD CLI
- 04. Deploy Your First Application via CLI
- 05. Inspect Application Status and Health
- 06. Manually Trigger a Sync
- 07. Diff Live State Against Git
- 08. Enable Auto-Sync with Self-Heal and Auto-Prune
- 09. Test Self-Healing
- 10. View Deployment History
- 11. Rollback an Application
- 12. Deploy a Helm Chart via ArgoCD
- 13. Deploy from Kustomize via ArgoCD
- 14. Connect a Private Repository
- 15. The App of Apps Pattern
- 16. Use Sync Waves for Ordered Deployment
- 17. Manage Projects
- 18. Use Resource Hooks (PreSync / PostSync)
- 19. Troubleshoot a Failed Sync
- 20. Cleanup and Uninstall ArgoCD
- 21. Chain CLI Commands for Release Workflows
01. Install ArgoCD via Helm¶
Install ArgoCD on a Kubernetes cluster using the official Argo Helm chart.
Scenario:¶
β¦ Your team has adopted GitOps and needs a central delivery platform for all Kubernetes workloads. β¦ You’ve chosen ArgoCD and need to install it on the cluster from scratch using Helm.
Hint: helm repo add, helm upgrade --install, kubectl get pods -n argocd
Solution
# 1. Add the Argo Helm repository
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update argo
# 2. Install ArgoCD in the argocd namespace (insecure mode: TLS terminated at Ingress)
helm upgrade --install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--set server.insecure=true \
--wait
# 3. Verify all pods are Running
kubectl get pods -n argocd
# Expected output (all pods Running):
# NAME READY STATUS
# argocd-application-controller-0 1/1 Running
# argocd-dex-server-xxxx 1/1 Running
# argocd-redis-xxxx 1/1 Running
# argocd-repo-server-xxxx 1/1 Running
# argocd-server-xxxx 1/1 Running
# 4. Retrieve the initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d; echo
# Save this password - you'll need it for the CLI and Web UI
# 5. Verify the ArgoCD CRDs were installed
kubectl get crd | grep argoproj
# Expected: applications.argoproj.io, appprojects.argoproj.io, ...
02. Expose ArgoCD via Ingress¶
Expose the ArgoCD API server using an Nginx Ingress so it is accessible via a hostname instead of port-forwarding.
Scenario:¶
β¦ Port-forwarding is fine for development but your team needs a stable URL to access the ArgoCD UI and CLI.
β¦ You will create an Ingress pointing argocd.local at the ArgoCD server.
Prerequisites: Nginx Ingress Controller installed on the cluster.
Hint: argocd-ingress.yaml, /etc/hosts, kubectl apply
Solution
# 1. Create the Ingress manifest
cat > argocd-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: argocd-server-ingress
namespace: argocd
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
ingressClassName: nginx
rules:
- host: argocd.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: argocd-server
port:
number: 80
EOF
# 2. Apply the Ingress
kubectl apply -f argocd-ingress.yaml
# 3. Verify the Ingress was created
kubectl get ingress -n argocd
# 4. Get the Ingress IP (use node IP for Kind/Minikube)
INGRESS_IP=$(kubectl get nodes \
-o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
echo "Ingress IP: ${INGRESS_IP}"
# 5. Add the hostname to /etc/hosts
echo "${INGRESS_IP} argocd.local" | sudo tee -a /etc/hosts
# 6. Verify connectivity
curl -s -o /dev/null -w "%{http_code}" http://argocd.local
# Expected: 200
# Open in browser
open http://argocd.local
# Fallback: port-forward if Ingress is not available
kubectl port-forward svc/argocd-server -n argocd 8080:80 &
open http://localhost:8080
03. Login with the ArgoCD CLI¶
Install the ArgoCD CLI and authenticate to the server.
Scenario:¶
β¦ You will use the ArgoCD CLI to manage applications, repositories, and sync policies from the terminal. β¦ Before any CLI operations, you must authenticate to the ArgoCD server.
Hint: brew install argocd, argocd login, argocd account update-password
Solution
# ββ Step 1: Install the ArgoCD CLI ββ
# macOS
brew install argocd
# Linux
curl -sSL -o argocd-linux-amd64 \
https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
rm argocd-linux-amd64
# Verify installation
argocd version --client
# ββ Step 2: Retrieve the admin password ββ
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d)
echo "Admin password: ${ARGOCD_PASSWORD}"
# ββ Step 3: Login via Ingress ββ
argocd login argocd.local \
--username admin \
--password "${ARGOCD_PASSWORD}" \
--insecure
# Login via port-forward (fallback)
argocd login localhost:8080 \
--username admin \
--password "${ARGOCD_PASSWORD}" \
--insecure
# ββ Step 4: Verify login ββ
argocd account get-user-info
# Output shows the currently logged-in user
argocd cluster list
# Should show: https://kubernetes.default.svc in-cluster ...
# ββ Step 5: Change the admin password (recommended) ββ
argocd account update-password \
--current-password "${ARGOCD_PASSWORD}" \
--new-password "MySecurePassword123!"
04. Deploy Your First Application via CLI¶
Create and deploy the classic ArgoCD guestbook example application using the CLI.
Scenario:¶
β¦ You want to deploy a demo application to validate that ArgoCD can fetch from Git and deploy to the cluster.
β¦ You will use the public argocd-example-apps repository and the guestbook path.
Hint: argocd app create, argocd app sync, kubectl port-forward
Solution
# 1. Create the application
argocd app create guestbook \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace guestbook \
--sync-option CreateNamespace=true
# 2. Verify the application was created
argocd app list
# Output shows:
# NAME CLUSTER NAMESPACE STATUS HEALTH SYNCPOLICY ...
# guestbook in-cluster guestbook OutOfSync Missing <none>
# 3. Manually trigger the first sync (deploy to cluster)
argocd app sync guestbook
# 4. Wait for the application to become Healthy + Synced
argocd app wait guestbook --health --sync --timeout 120
# 5. Verify Kubernetes resources were created
kubectl get all -n guestbook
# Expected:
# pod/guestbook-ui-xxxx Running
# service/guestbook-ui
# deployment/guestbook-ui
# 6. Access the application
kubectl port-forward svc/guestbook-ui -n guestbook 8081:80 &
open http://localhost:8081
# Cleanup the port-forward
kill %1
05. Inspect Application Status and Health¶
Use the CLI to inspect the full status, health, and resource tree of a deployed application.
Scenario:¶
β¦ A team member deployed an application and you need to understand its current state without touching the cluster directly. β¦ You want to see what Kubernetes resources ArgoCD is managing.
Hint: argocd app get, --refresh, --output tree
Solution
# 1. Get a summary of the application
argocd app get guestbook
# Output includes:
# Name: guestbook
# Project: default
# Server: https://kubernetes.default.svc
# Namespace: guestbook
# URL: http://argocd.local/applications/guestbook
# Repo: https://github.com/argoproj/argocd-example-apps.git
# Target: HEAD
# Path: guestbook
# SyncWindow: Sync Allowed
# Sync Policy: <none>
# Sync Status: Synced to HEAD
# Health Status: Healthy
#
# GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
# Service guestbook guestbook-ui Synced Healthy
# apps Deployment guestbook guestbook-ui Synced Healthy ...
# 2. Force-refresh from Git before displaying
argocd app get guestbook --refresh
# 3. Display as a resource tree
argocd app get guestbook --output tree
# 4. Output as JSON for scripting
argocd app get guestbook -o json
# 5. Get JSON and parse with jq
argocd app get guestbook -o json | \
jq '{name: .metadata.name, sync: .status.sync.status, health: .status.health.status}'
# 6. Watch live updates
watch argocd app get guestbook
# 7. Get all applications and their health at once
argocd app list -o wide
06. Manually Trigger a Sync¶
Practice manually triggering a sync and understand all the sync options available.
Scenario:¶
β¦ You pushed a change to Git and want to immediately deploy it without waiting for the 3-minute poll interval. β¦ You also want to understand force sync, dry-run, and selective resource sync options.
Hint: argocd app sync, --dry-run, --force, --resource
Solution
# 1. Basic sync - apply the Git state to the cluster
argocd app sync guestbook
# 2. Sync and wait for completion with a timeout
argocd app sync guestbook --timeout 120
# 3. Dry-run - preview what would change without applying
argocd app sync guestbook --dry-run
# 4. Force sync - replace resources even if spec is unchanged
argocd app sync guestbook --force
# 5. Sync with pruning - delete resources removed from Git
argocd app sync guestbook --prune
# 6. Sync a specific resource only (avoids re-applying unchanged resources)
argocd app sync guestbook \
--resource apps:Deployment:guestbook-ui
# 7. Sync only resources matching a label
argocd app sync guestbook \
--label app=guestbook-ui
# 8. Sync with apply-out-of-sync-only (skip already-synced resources)
argocd app sync guestbook --apply-out-of-sync-only
# 9. Sync multiple applications at once
argocd app sync guestbook app-of-apps efk-stack
# 10. Check the sync status after sync
argocd app get guestbook | grep -E "Sync|Health"
07. Diff Live State Against Git¶
Use argocd app diff to see exactly what has drifted between the live cluster state and Git.
Scenario:¶
β¦ A developer manually patched a running Deployment with kubectl edit and your monitoring shows the application is OutOfSync.
β¦ Before syncing to fix the drift, you want to see exactly what changed.
Hint: argocd app diff, kubectl scale, drift detection
Solution
# 1. Deliberately introduce drift - manually scale the deployment
kubectl scale deployment guestbook-ui --replicas=5 -n guestbook
# 2. Wait for ArgoCD to detect the drift
sleep 10
argocd app get guestbook | grep -E "Sync|Health"
# Sync Status: OutOfSync
# 3. Show the diff - what changed in live vs Git
argocd app diff guestbook
# Output highlights the replica count change:
# ===== apps/Deployment guestbook/guestbook-ui ======
# 10 - replicas: 5 (live)
# 10 + replicas: 1 (desired from Git)
# 4. Diff against a specific Git revision
argocd app diff guestbook --revision HEAD~1
# 5. Diff only a specific resource
argocd app diff guestbook \
--resource apps:Deployment:guestbook-ui
# 6. Use in CI - exit non-zero if drift is detected
if ! argocd app diff guestbook --exit-code; then
echo "DRIFT DETECTED - syncing..."
argocd app sync guestbook
fi
# 7. Restore the desired state from Git
argocd app sync guestbook
kubectl get deployment guestbook-ui -n guestbook
# READY should be back to 1/1
08. Enable Auto-Sync with Self-Heal and Auto-Prune¶
Configure automated sync so ArgoCD continuously reconciles the cluster state with Git.
Scenario:¶
β¦ Your team pushes application changes directly to Git and expects them to be deployed automatically. β¦ You also want ArgoCD to clean up resources removed from Git and heal any manual drift.
Hint: argocd app set --sync-policy automated, --self-heal, --auto-prune
Solution
# 1. Enable automated sync (ArgoCD polls Git every ~3 minutes)
argocd app set guestbook --sync-policy automated
# 2. Verify the sync policy was applied
argocd app get guestbook | grep "Sync Policy"
# Sync Policy: Automated
# 3. Add self-heal: restores Git state if cluster is manually modified
argocd app set guestbook --self-heal
# 4. Add auto-prune: deletes resources removed from Git
argocd app set guestbook --auto-prune
# 5. Verify all options are active
argocd app get guestbook | grep -E "Sync Policy|Prune|Self Heal"
# 6. Test auto-sync: manually break the state
kubectl scale deployment guestbook-ui --replicas=5 -n guestbook
echo "Waiting for ArgoCD self-heal..."
sleep 30
kubectl get deployment guestbook-ui -n guestbook
# READY should be restored to 1/1 by ArgoCD
# 7. Configure using app manifest equivalents (declarative approach)
# The equivalent spec in an Application YAML:
# spec:
# syncPolicy:
# automated:
# prune: true
# selfHeal: true
# syncOptions:
# - CreateNamespace=true
# 8. Disable auto-sync (switch back to manual)
argocd app set guestbook --sync-policy none
argocd app get guestbook | grep "Sync Policy"
# Sync Policy: <none>
09. Test Self-Healing¶
Validate that ArgoCD self-healing works by deliberately introducing drift and observing automatic recovery.
Scenario:¶
β¦ A runbook says to test ArgoCD self-healing quarterly. β¦ You need to break the cluster state and confirm ArgoCD repairs it within the reconciliation window.
Hint: kubectl scale, kubectl delete, watch argocd app get
Solution
# 0. Ensure auto-sync + self-heal are enabled
argocd app set guestbook --sync-policy automated --self-heal --auto-prune
# ββ Test 1: Scale drift ββ
# Break it
kubectl scale deployment guestbook-ui --replicas=10 -n guestbook
echo "Breaking: scaled to 10 replicas"
# Watch ArgoCD detect and fix it (up to ~30s)
watch -n 5 "kubectl get deployment guestbook-ui -n guestbook && argocd app get guestbook | grep -E 'Status|Health'"
# After ~30 seconds, replicas will return to the value in Git
kubectl get deployment guestbook-ui -n guestbook
# DESIRED should match Git (e.g., 1)
# ββ Test 2: Delete a managed resource ββ
# Delete the service
kubectl delete service guestbook-ui -n guestbook
echo "Deleted the guestbook-ui service"
# ArgoCD detects the missing resource and recreates it
sleep 30
kubectl get service guestbook-ui -n guestbook
# Service should be recreated
# ββ Test 3: Manual label change ββ
# Add a label not in Git
kubectl label deployment guestbook-ui -n guestbook manual-change=true
# ArgoCD will detect and revert this within the next sync cycle
sleep 60
kubectl get deployment guestbook-ui -n guestbook --show-labels | grep manual-change
# Label should be gone
# ββ Summary ββ
argocd app get guestbook
# Health Status: Healthy
# Sync Status: Synced
10. View Deployment History¶
Use argocd app history to inspect the deployment history of an application.
Scenario:¶
β¦ You need to audit which Git commits were deployed over the past month. β¦ You want to identify the revision ID to use for a rollback.
Hint: argocd app history, -o json, jq
Solution
# 1. Create some history by triggering multiple syncs
argocd app sync guestbook
argocd app sync guestbook
argocd app sync guestbook
# 2. View the deployment history
argocd app history guestbook
# Output shows each deployment:
# ID DATE REVISION
# 0 2026-02-22 10:00:00 +0000 UTC HEAD (abc1234)
# 1 2026-02-22 10:05:00 +0000 UTC HEAD (abc1234)
# 2 2026-02-22 10:10:00 +0000 UTC HEAD (abc1234)
# 3. Output as JSON for scripting
argocd app history guestbook -o json
# 4. Extract key fields with jq
argocd app history guestbook -o json | \
jq '.[] | {id: .id, date: .deployedAt, revision: .revision}'
# 5. Find the most recent deployment
argocd app history guestbook -o json | jq '.[-1]'
# 6. Find deployments by Git commit SHA
argocd app history guestbook -o json | \
jq '.[] | select(.revision | contains("abc1234"))'
# 7. Save history to file for an audit log
argocd app history guestbook -o json > guestbook-deploy-history.json
cat guestbook-deploy-history.json
11. Rollback an Application¶
Rollback an application to a previously deployed revision using the ArgoCD CLI.
Scenario:¶
β¦ A recent deployment introduced a regression. β¦ You need to immediately revert to the last known-good revision to restore service.
Hint: argocd app history, argocd app rollback, argocd app set --sync-policy
Solution
# 1. Inspect the deployment history to choose a target revision
argocd app history guestbook
# Note the ID of the revision you want to roll back to.
# In this example, we'll rollback to revision ID 0.
# 2. Perform the rollback
argocd app rollback guestbook 0
# ArgoCD rolls back the cluster state to the snapshot from revision 0.
# NOTE: Rollback disables automated sync on the app to prevent
# ArgoCD from immediately re-syncing forward again.
# 3. Wait for the rollback to complete
argocd app wait guestbook --health --timeout 120
# 4. Verify the status
argocd app get guestbook
# 5. Verify the Kubernetes resources reflect the rolled-back state
kubectl get all -n guestbook
# 6. Check history - rollback is recorded as a new entry
argocd app history guestbook
# 7. Re-enable auto-sync after the incident is resolved
argocd app set guestbook \
--sync-policy automated \
--self-heal \
--auto-prune
# 8. Confirm the app is back to Synced + Healthy
argocd app get guestbook | grep -E "Sync|Health"
12. Deploy a Helm Chart via ArgoCD¶
Use ArgoCD to deploy a Helm chart from a chart repository, with custom values managed in Git.
Scenario:¶
β¦ You want ArgoCD to own the lifecycle of a Helm release, including upgrades and drift detection.
β¦ Custom values.yaml overrides are stored in Git so changes go through GitOps.
Hint: argocd app create --helm-chart, --helm-set, --revision
Solution
# ββ Option A: Deploy a Helm chart from an OCI / chart registry ββ
argocd app create nginx-helm \
--repo https://charts.bitnami.com/bitnami \
--helm-chart nginx \
--revision 15.1.0 \
--dest-server https://kubernetes.default.svc \
--dest-namespace nginx-helm \
--sync-option CreateNamespace=true \
--helm-set service.type=ClusterIP \
--helm-set replicaCount=2
# Sync and wait
argocd app sync nginx-helm
argocd app wait nginx-helm --health --timeout 120
# ββ Option B: Deploy a Helm chart stored in a Git repository ββ
# Store values overrides in Git, e.g.:
# my-repo/nginx/values.yaml
# apiVersion: argoproj.io/v1alpha1 β not needed, ArgoCD auto-detects Helm
argocd app create nginx-git-helm \
--repo https://github.com/my-org/my-charts.git \
--path nginx \
--dest-server https://kubernetes.default.svc \
--dest-namespace nginx-git-helm \
--sync-option CreateNamespace=true
# ββ Update Helm values through CLI (without changing Git) ββ
argocd app set nginx-helm \
--helm-set replicaCount=3 \
--helm-set image.tag=1.25.0
argocd app sync nginx-helm
# ββ Verify ββ
argocd app get nginx-helm | grep -E "Sync|Health|Revision"
kubectl get deployment -n nginx-helm
# ββ Cleanup ββ
argocd app delete nginx-helm --yes
kubectl delete namespace nginx-helm
13. Deploy from Kustomize via ArgoCD¶
Use ArgoCD to deploy a Kustomize-based application, showing how ArgoCD auto-detects the tool.
Scenario:¶
β¦ Your team uses Kustomize overlays to manage configuration across environments (base + overlays). β¦ You want ArgoCD to render and deploy the Kustomize manifests for a specific overlay.
Hint: ArgoCD auto-detects Kustomize from kustomization.yaml. Point --path to the overlay directory.
Solution
# 1. Create a minimal Kustomize app structure in your Git repo
# Structure:
# kustomize-demo/
# βββ base/
# β βββ deployment.yaml
# β βββ service.yaml
# β βββ kustomization.yaml
# βββ overlays/
# βββ dev/
# βββ replica-patch.yaml
# βββ kustomization.yaml
# 2. Create the application in ArgoCD pointing at a Kustomize overlay
argocd app create kustomize-demo \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path kustomize-guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace kustomize-demo \
--sync-option CreateNamespace=true
# ArgoCD detects kustomization.yaml and uses `kustomize build` to render manifests
# 3. Sync the application
argocd app sync kustomize-demo
argocd app wait kustomize-demo --health --timeout 120
# 4. View rendered manifests (what kustomize build produced)
argocd app manifests kustomize-demo
# 5. Verify resources
kubectl get all -n kustomize-demo
# 6. Apply a Kustomize image override via CLI
argocd app set kustomize-demo \
--kustomize-image gcr.io/argoproj/argocd-example-apps/guestbook-ui:v0.2
argocd app sync kustomize-demo
# 7. Cleanup
argocd app delete kustomize-demo --yes
kubectl delete namespace kustomize-demo
14. Connect a Private Repository¶
Add a private Git repository to ArgoCD using an HTTPS token or SSH key.
Scenario:¶
β¦ Your application manifests live in a private GitHub repository. β¦ ArgoCD needs credentials to clone the repository in order to deploy from it.
Hint: argocd repo add, --username, --password, --ssh-private-key-path
Solution
# ββ Option A: Connect via HTTPS Personal Access Token (PAT) ββ
# Create a GitHub PAT with 'repo' scope at https://github.com/settings/tokens
argocd repo add https://github.com/my-org/private-repo.git \
--username git \
--password <YOUR_PAT_HERE>
# ββ Option B: Connect via SSH Key ββ
# Generate a deploy key (no passphrase)
ssh-keygen -t ed25519 -C "argocd-deploy-key" -f ~/.ssh/argocd-deploy-key -N ""
# Add the public key to GitHub repo:
# GitHub Repo β Settings β Deploy Keys β Add Deploy Key
# Paste the contents of ~/.ssh/argocd-deploy-key.pub
# Add the private key to ArgoCD
argocd repo add git@github.com:my-org/private-repo.git \
--ssh-private-key-path ~/.ssh/argocd-deploy-key
# ββ Option C: Add a private Helm chart repository ββ
argocd repo add https://my-private-charts.example.com \
--type helm \
--name private-charts \
--username admin \
--password <PASSWORD>
# ββ Verify the connection ββ
argocd repo list
# Expected output shows STATUS: Successful
# SERVER TYPE STATUS MESSAGE
# https://github.com/my-org/private-repo.git git Successful
# ββ Use the private repo in an application ββ
argocd app create my-private-app \
--repo https://github.com/my-org/private-repo.git \
--path manifests \
--dest-server https://kubernetes.default.svc \
--dest-namespace my-app
# ββ Remove a repository ββ
argocd repo rm https://github.com/my-org/private-repo.git
15. The App of Apps Pattern¶
Use a single root Application to manage a directory of child Application manifests declaratively.
Scenario:¶
β¦ You have many microservices and want a single GitOps entry point. β¦ Adding or removing an app is as simple as committing or deleting a YAML file in Git. β¦ The App of Apps pattern makes fleet management fully declarative.
Hint: argocd app create pointing at a directory of Application YAMLs, argocd app list
Solution
# ββ Step 1: Create child Application manifests and commit them to Git ββ
# apps/guestbook.yaml
cat > /tmp/guestbook.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: guestbook
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/argoproj/argocd-example-apps.git
targetRevision: HEAD
path: guestbook
destination:
server: https://kubernetes.default.svc
namespace: guestbook
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
# apps/nginx.yaml
cat > /tmp/nginx.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: nginx-demo
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/argoproj/argocd-example-apps.git
targetRevision: HEAD
path: nginx-ingress
destination:
server: https://kubernetes.default.svc
namespace: nginx-demo
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
# Commit both files to the apps/ directory of your Git repository
# ββ Step 2: Create the root App of Apps ββ
argocd app create app-of-apps \
--repo https://github.com/my-org/my-gitops-repo.git \
--path apps \
--dest-server https://kubernetes.default.svc \
--dest-namespace argocd \
--sync-policy automated \
--auto-prune \
--self-heal
# ββ Step 3: Sync the root app ββ
argocd app sync app-of-apps
# ArgoCD discovers all YAML files in apps/ and creates child Applications
# ββ Step 4: Verify all child apps were created ββ
argocd app list
# Expected:
# NAME CLUSTER NAMESPACE STATUS HEALTH SYNCPOLICY
# app-of-apps in-cluster argocd Synced Healthy Auto-Prune
# guestbook in-cluster guestbook Synced Healthy Auto-Prune
# nginx-demo in-cluster nginx-demo Synced Healthy Auto-Prune
# ββ Step 5: Add a new application (GitOps way) ββ
# Commit a new YAML file to the apps/ directory in Git.
# ArgoCD detects the change and automatically creates the child Application.
# No kubectl or argocd commands needed!
# ββ Step 6: Remove an application (GitOps way) ββ
# Delete the YAML file from apps/ in Git and commit.
# With auto-prune enabled, ArgoCD deletes the Application and its resources.
16. Use Sync Waves for Ordered Deployment¶
Control the order in which resources are synced during a deployment using sync wave annotations.
Scenario:¶
β¦ You have a database, a backend API, and a frontend that must start in order. β¦ Sync waves let you define phases so that each component waits for the previous one to become healthy.
Hint: argocd.argoproj.io/sync-wave annotation, wave numbers
Solution
# Sync waves are set as annotations on Kubernetes resources in Git.
# Resources in lower waves deploy and become healthy before higher waves start.
# ββ Example: 3-tier app with ordered deployment ββ
# wave 0: Namespace and ConfigMaps (no dependencies)
cat > /tmp/namespace.yaml << 'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: my-app
annotations:
argocd.argoproj.io/sync-wave: "0"
EOF
# wave 1: Database (must be healthy before the API starts)
cat > /tmp/database-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
namespace: my-app
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_PASSWORD
value: "mysecretpassword"
EOF
# wave 2: Backend API (waits for database to be healthy)
cat > /tmp/api-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
namespace: my-app
annotations:
argocd.argoproj.io/sync-wave: "2"
spec:
replicas: 2
selector:
matchLabels:
app: backend-api
template:
metadata:
labels:
app: backend-api
spec:
containers:
- name: api
image: my-api:latest
EOF
# wave 3: Frontend (waits for the API to be healthy)
cat > /tmp/frontend-deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: my-app
annotations:
argocd.argoproj.io/sync-wave: "3"
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: my-frontend:latest
EOF
# Commit all files to a Git path, then create the app:
argocd app create my-app \
--repo https://github.com/my-org/my-repo.git \
--path manifests \
--dest-server https://kubernetes.default.svc \
--dest-namespace my-app \
--sync-option CreateNamespace=true
argocd app sync my-app
# Watch the wave-by-wave deployment progress
watch argocd app get my-app
# Wave execution order:
# Wave 0: Namespace created
# Wave 1: postgres Deployment reaches Healthy
# Wave 2: backend-api Deployment reaches Healthy
# Wave 3: frontend Deployment reaches Healthy
17. Manage Projects¶
Create an ArgoCD Project to restrict what repositories, clusters, and namespaces an application can use.
Scenario:¶
β¦ Your cluster hosts applications for multiple teams (frontend, backend, ops).
β¦ You want to prevent the frontend team from accidentally deploying to the kube-system namespace.
β¦ ArgoCD Projects provide RBAC-level isolation between teams.
Hint: argocd proj create, --src-repos, --dest, argocd proj list
Solution
# 1. Create a project for the frontend team
argocd proj create frontend \
--description "Frontend team applications" \
--src-repos "https://github.com/my-org/frontend-repo.git" \
--dest "https://kubernetes.default.svc,frontend-*" \
--dest "https://kubernetes.default.svc,staging"
# --src-repos: only this repo is allowed as a source
# --dest: only these patterns are allowed as destinations (cluster,namespace)
# 2. Verify the project was created
argocd proj list
# 3. View project details
argocd proj get frontend
# 4. Add additional allowed source repositories
argocd proj add-source frontend \
"https://github.com/my-org/shared-charts.git"
# 5. Add allowed destinations
argocd proj add-destination frontend \
https://kubernetes.default.svc production-frontend
# 6. Set cluster-scope resource DENY list (prevent modification of cluster-level resources)
argocd proj deny-cluster-resource frontend "*" "*"
argocd proj allow-cluster-resource frontend "" "Namespace"
# 7. Assign an application to the project
argocd app create frontend-app \
--project frontend \
--repo https://github.com/my-org/frontend-repo.git \
--path manifests \
--dest-server https://kubernetes.default.svc \
--dest-namespace frontend-prod
# 8. Attempting to use a disallowed repo will fail with a permission error
argocd app create bad-app \
--project frontend \
--repo https://github.com/other-org/other-repo.git \
--path manifests \
--dest-server https://kubernetes.default.svc \
--dest-namespace kube-system
# Error: application destination {... kube-system} is not permitted in project 'frontend'
# 9. Cleanup
argocd app delete frontend-app --yes 2>/dev/null || true
argocd proj delete frontend
18. Use Resource Hooks (PreSync / PostSync)¶
Use ArgoCD resource hooks to run Jobs before or after a sync operation - e.g., database migrations or smoke tests.
Scenario:¶
β¦ Your application requires a database migration to run before the new version starts. β¦ After deployment you want a smoke test to verify the application is responding correctly.
Hint: argocd.argoproj.io/hook annotation, PreSync, PostSync, argocd.argoproj.io/hook-delete-policy
Solution
# Hooks are standard Kubernetes Jobs with special ArgoCD annotations.
# They are stored in your Git repository alongside the application manifests.
# ββ PreSync Hook: Run database migration before sync ββ
cat > /tmp/pre-sync-migration.yaml << 'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
namespace: my-app
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: migrate
image: my-app:latest
command: ["./migrate.sh"]
env:
- name: DB_HOST
value: postgres.my-app.svc
EOF
# ββ PostSync Hook: Run smoke test after sync ββ
cat > /tmp/post-sync-smoke-test.yaml << 'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: smoke-test
namespace: my-app
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
restartPolicy: Never
containers:
- name: smoke-test
image: curlimages/curl:latest
command:
- sh
- -c
- |
echo "Running smoke test..."
until curl -sf http://my-app.my-app.svc/health; do
echo "Service not ready, retrying in 5s..."
sleep 5
done
echo "Smoke test passed!"
EOF
# ββ Sync Wave + Hook combination ββ
# Use sync waves to order hooks relative to other resources:
# annotations:
# argocd.argoproj.io/hook: PreSync
# argocd.argoproj.io/sync-wave: "-5" # Run early within PreSync phase
# ββ Hook Delete Policies ββ
# HookSucceeded: Delete job when it succeeds (default)
# HookFailed: Delete job when it fails
# BeforeHookCreation: Delete previous run before creating a new one
# ββ Commit the hook files to Git and sync ββ
argocd app sync my-app
# Watch hooks execute
kubectl get jobs -n my-app -w
# Check hook logs
kubectl logs job/db-migration -n my-app
kubectl logs job/smoke-test -n my-app
19. Troubleshoot a Failed Sync¶
Diagnose and fix a sync failure using CLI commands and kubectl.
Scenario:¶
β¦ An application is stuck in OutOfSync or Degraded state.
β¦ You need to identify the root cause and resolve it.
Hint: argocd app get, argocd app diff, kubectl describe, kubectl logs
Solution
# ββ Step 1: Get the high-level status ββ
argocd app get <app-name>
# Look for degraded resources or error messages in the resource list
# ββ Step 2: Show the diff to understand what ArgoCD is trying to apply ββ
argocd app diff <app-name>
# ββ Step 3: Get the rendered manifests ββ
argocd app manifests <app-name>
# Validate the manifest looks correct
# ββ Step 4: Check ArgoCD conditions and events ββ
kubectl describe application <app-name> -n argocd
# Look at Conditions and Events sections
# ββ Step 5: Check the ArgoCD application controller logs ββ
kubectl logs -n argocd \
-l app.kubernetes.io/name=argocd-application-controller \
--tail=100 | grep -i "error\|failed\|<app-name>"
# ββ Step 6: Check the repo-server logs (manifest rendering issues) ββ
kubectl logs -n argocd \
-l app.kubernetes.io/name=argocd-repo-server \
--tail=50 | grep -i "error\|failed"
# ββ Step 7: Force-refresh and retry sync ββ
argocd app get <app-name> --refresh
argocd app sync <app-name> --force
# ββ Step 8: Common issues and fixes ββ
# Issue: Repository error (auth failure)
argocd repo list # Check STATUS column
argocd repo get <repo-url> # Check detailed status
# Issue: Out of sync but diff shows no changes (stuck sync)
argocd app sync <app-name> --force --replace
# Issue: Hook is stuck running
kubectl get jobs -n <namespace>
kubectl delete job <stuck-job-name> -n <namespace>
argocd app sync <app-name>
# Issue: Resource exists with different owner (e.g., fields managed by another controller)
argocd app sync <app-name> --server-side-apply
# Issue: Namespace doesn't exist
argocd app set <app-name> --sync-option CreateNamespace=true
argocd app sync <app-name>
# ββ Step 9: App of Apps - child apps not created ββ
argocd app get app-of-apps # Check root is Synced
argocd repo list # Confirm repo is accessible
argocd app manifests app-of-apps # Confirm apps/ dir renders correctly
kubectl get applications -n argocd # Check all Application CRs
20. Cleanup and Uninstall ArgoCD¶
Safely remove all ArgoCD applications and uninstall ArgoCD from the cluster.
Scenario:¶
β¦ You’ve finished a demo or training environment and need to tear everything down cleanly. β¦ Resources must be deleted in the correct order to avoid orphaned namespaces or finalizer deadlocks.
Hint: argocd app delete --cascade, helm uninstall, finalizer removal
Solution
# ββ Step 1: Delete all managed applications (cascade removes K8s resources too) ββ
# List all applications first
argocd app list
# Delete individual apps with cascade
argocd app delete guestbook --yes
argocd app delete app-of-apps --yes
# Or delete ALL applications in one command
argocd app list -o name | xargs -I {} argocd app delete {} --yes
# ββ Step 2: Verify managed namespaces were cleaned up ββ
kubectl get namespace | grep -E "guestbook|efk|nginx"
# ββ Step 3: Remove connected repositories ββ
argocd repo list | awk 'NR>1 {print $1}' | xargs -I {} argocd repo rm {}
# ββ Step 4: Remove custom Projects (if any were created) ββ
argocd proj list | awk 'NR>1 && $1 != "default" {print $1}' | \
xargs -I {} argocd proj delete {}
# ββ Step 5: If apps are stuck due to finalizers, remove them manually ββ
# List all Application CRs
kubectl get applications -n argocd
# Remove a stuck application's finalizer
kubectl patch application <app-name> -n argocd \
-p '{"metadata":{"finalizers":[]}}' \
--type merge
# ββ Step 6: Uninstall ArgoCD via Helm ββ
helm uninstall argocd --namespace argocd
# ββ Step 7: Delete the ArgoCD namespace and CRDs ββ
kubectl delete namespace argocd
# Delete ArgoCD CRDs
kubectl get crd | grep argoproj.io | awk '{print $1}' | \
xargs kubectl delete crd
# ββ Step 8: Verify everything is gone ββ
kubectl get all -n argocd # Should return "No resources found"
kubectl get crd | grep argoproj # Should return nothing
helm list --all-namespaces # argocd should not appear
21. Chain CLI Commands for Release Workflows¶
Practice common multi-step ArgoCD CLI workflows for day-to-day GitOps operations.
Scenario:¶
β¦ You want repeatable, scriptable workflows for deploying, validating, and rolling back GitOps applications. β¦ These one-liners and scripts model real-world CI/CD integration patterns.
Hint: Chain argocd and kubectl commands with &&, ||, and loops.
Solution
# ββ Workflow 1: Install ArgoCD, login, and deploy guestbook in one sequence ββ
helm upgrade --install argocd argo/argo-cd \
--namespace argocd --create-namespace \
--set server.insecure=true --wait && \
PASS=$(kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d) && \
argocd login argocd.local \
--username admin --password "$PASS" --insecure && \
argocd app create guestbook \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace guestbook \
--sync-option CreateNamespace=true && \
argocd app sync guestbook && \
argocd app wait guestbook --health --timeout 120 && \
echo "Guestbook deployed successfully!"
# ββ Workflow 2: Deploy and auto-heal setup ββ
argocd app create guestbook \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace guestbook \
--sync-policy automated \
--auto-prune \
--self-heal \
--sync-option CreateNamespace=true && \
argocd app wait guestbook --health && \
argocd app get guestbook
# ββ Workflow 3: Health-check and rollback on failure ββ
argocd app sync guestbook --timeout 120 && \
argocd app wait guestbook --health --timeout 60 || \
( echo "Deployment failed - rolling back." && argocd app rollback guestbook 0 )
# ββ Workflow 4: Check all apps and alert on degraded ββ
DEGRADED=$(argocd app list -o json | jq -r \
'.[] | select(.status.health.status != "Healthy") | .metadata.name')
if [ -n "$DEGRADED" ]; then
echo "ALERT: Degraded applications detected:"
echo "$DEGRADED"
else
echo "All applications are Healthy."
fi
# ββ Workflow 5: Force-refresh and sync all out-of-sync apps ββ
argocd app list -o json | \
jq -r '.[] | select(.status.sync.status == "OutOfSync") | .metadata.name' | \
xargs -I {} bash -c 'argocd app get {} --refresh && argocd app sync {}'
# ββ Workflow 6: Deploy App of Apps and wait for all children ββ
argocd app create app-of-apps \
--repo https://github.com/my-org/my-gitops-repo.git \
--path apps \
--dest-server https://kubernetes.default.svc \
--dest-namespace argocd \
--sync-policy automated && \
argocd app sync app-of-apps && \
sleep 10 && \
argocd app list
# ββ Workflow 7: Export all application definitions for backup ββ
mkdir -p argocd-backup
argocd app list -o name | while read APP; do
argocd app get "$APP" -o json > "argocd-backup/${APP}.json"
echo "Backed up: ${APP}"
done
ls -la argocd-backup/
# ββ Workflow 8: Multi-environment deployment with different values ββ
for ENV in dev staging prod; do
argocd app create "guestbook-${ENV}" \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace "guestbook-${ENV}" \
--sync-policy automated \
--auto-prune \
--self-heal \
--sync-option CreateNamespace=true
echo "Created: guestbook-${ENV}"
done
argocd app list
# ββ Workflow 9: Full teardown ββ
argocd app list -o name | xargs -I {} argocd app delete {} --yes && \
helm uninstall argocd -n argocd && \
kubectl delete namespace argocd && \
echo "ArgoCD fully removed."
# Cleanup multi-env apps
for ENV in dev staging prod; do
kubectl delete namespace "guestbook-${ENV}" 2>/dev/null || true
done
rm -rf argocd-backup/
Diagram: ArgoCD GitOps Workflow¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ArgoCD GitOps Flow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Developer βββΊ git push βββΊ Git Repository (Source of Truth) β
β β β
β ArgoCD polls every ~3 min β
β β β
β ββββββββββββββββββββββββββΌβββββββββββββββββββββββ β
β β ArgoCD Control Plane β β
β β β β
β β API Server βββ argocd CLI / Web UI β β
β β β β β
β β App Controller βββΊ compare desired vs live β β
β β β β β
β β Repo Server βββΊ renders Helm/Kustomize/YAML β β
β ββββββββββββββββββββββββββ¬βββββββββββββββββββββββ β
β β β
β sync / heal β
β β β
β ββββββββββββββββββββββββββΌβββββββββββββββββββββββ β
β β Kubernetes Cluster β β
β β β β
β β Namespace: guestbook βββΊ Deployment, Svc β β
β β Namespace: efk βββΊ Elasticsearch,β¦ β β
β β Namespace: argocd βββΊ App of Apps β β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Quick Reference: Essential ArgoCD CLI Commands¶
| Command | Description |
|---|---|
argocd login <server> |
Authenticate to ArgoCD server |
argocd account update-password |
Change the admin password |
argocd cluster list |
List connected clusters |
argocd repo add <url> |
Connect a Git or Helm repository |
argocd repo list |
List all connected repositories |
argocd app create <name> |
Create a new application |
argocd app list |
List all applications and their status |
argocd app get <name> |
Get detailed status and resource tree |
argocd app get <name> --refresh |
Force-refresh from Git before displaying |
argocd app sync <name> |
Manually trigger a sync |
argocd app sync <name> --dry-run |
Preview a sync without applying |
argocd app diff <name> |
Show diff between Git and live state |
argocd app set <name> --sync-policy automated |
Enable automated sync |
argocd app set <name> --self-heal --auto-prune |
Enable self-heal and auto-prune |
argocd app wait <name> --health |
Wait for application to become Healthy |
argocd app history <name> |
Show deployment history |
argocd app rollback <name> <revision-id> |
Rollback to a previous revision |
argocd app manifests <name> |
Show rendered Kubernetes manifests |
argocd app delete <name> --yes |
Delete application (cascades to K8s resources) |
argocd proj create <name> |
Create a new project |
argocd proj list |
List all projects |
argocd context |
List all saved server contexts |
argocd context <name> |
Switch to a different ArgoCD server context |
Kubernetes Scheduling Tasks¶
- Hands-on Kubernetes exercises covering Node Affinity, Pod Affinity, Pod Anti-Affinity, Taints, Tolerations, and Topology Spread Constraints.
- Each task includes a description, scenario, and a detailed solution with step-by-step instructions.
- Practice these tasks to master fine-grained Pod placement and scheduling strategies.
Table of Contents¶
- 01. Label Nodes and Use nodeSelector
- 02. Required Node Affinity with Multiple Labels
- 03. Preferred Node Affinity with Weights
- 04. Pod Anti-Affinity for High Availability
- 05. Pod Affinity to Co-Locate Services
- 06. Taint a Node and Add a Toleration
- 07. NoExecute Taint with tolerationSeconds
- 08. Topology Spread Constraints
- 09. Combine Node Affinity with Taints
- 10. Debug a Pending Pod
01. Label Nodes and Use nodeSelector¶
Add a custom label to a node and schedule a Pod on it using nodeSelector.
Scenario:¶
β¦ You have a node with SSD storage and want to ensure a database Pod only runs on it.
β¦ nodeSelector is the simplest scheduling constraint.
Hint: kubectl label nodes, then use spec.nodeSelector in the Pod spec.
Solution
# 1. List nodes
kubectl get nodes
# 2. Label a node
kubectl label nodes <node-name> disk-type=ssd
# 3. Create a Pod with nodeSelector
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: ssd-pod
spec:
nodeSelector:
disk-type: ssd
containers:
- name: app
image: nginx:1.25
EOF
# 4. Verify the Pod landed on the correct node
kubectl get pod ssd-pod -o wide
# 5. Cleanup
kubectl delete pod ssd-pod
kubectl label nodes <node-name> disk-type-
02. Required Node Affinity with Multiple Labels¶
Schedule a Pod that requires nodes with BOTH environment=production AND zone=us-east labels.
Scenario:¶
β¦ Your production workload must run in a specific zone on production-labeled nodes.
β¦ Node Affinity with the In operator lets you express this constraint.
Hint: Use requiredDuringSchedulingIgnoredDuringExecution with multiple matchExpressions in a single nodeSelectorTerms entry.
Solution
# 1. Label nodes
kubectl label nodes <node-name> environment=production zone=us-east
# 2. Create the Pod
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: prod-east-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: environment
operator: In
values: [production]
- key: zone
operator: In
values: [us-east]
containers:
- name: app
image: nginx:1.25
EOF
# 3. Verify
kubectl get pod prod-east-pod -o wide
kubectl describe pod prod-east-pod | grep "Node:"
# 4. Cleanup
kubectl delete pod prod-east-pod
kubectl label nodes <node-name> environment- zone-
03. Preferred Node Affinity with Weights¶
Deploy a Pod that strongly prefers production nodes (weight 80) and weakly prefers SSD nodes (weight 20).
Scenario:¶
β¦ You want soft scheduling preferences - the Pod should schedule even if neither preference is met. β¦ Weights (1β100) let you prioritize multiple preferences.
Hint: Use preferredDuringSchedulingIgnoredDuringExecution with two entries at different weights.
Solution
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: weighted-pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: environment
operator: In
values: [production]
- weight: 20
preference:
matchExpressions:
- key: disk-type
operator: In
values: [ssd]
containers:
- name: app
image: nginx:1.25
EOF
kubectl get pod weighted-pod -o wide
kubectl describe pod weighted-pod | grep "Node:"
# Cleanup
kubectl delete pod weighted-pod
04. Pod Anti-Affinity for High Availability¶
Deploy a 3-replica Deployment where no two replicas land on the same node.
Scenario:¶
β¦ For high availability, replicas should be spread across different nodes. β¦ If you have fewer nodes than replicas, some Pods will stay Pending with required anti-affinity.
Hint: Use podAntiAffinity with requiredDuringSchedulingIgnoredDuringExecution and topologyKey: kubernetes.io/hostname.
Solution
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-web
spec:
replicas: 3
selector:
matchLabels:
app: ha-web
template:
metadata:
labels:
app: ha-web
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: ha-web
topologyKey: kubernetes.io/hostname
containers:
- name: web
image: nginx:1.25
EOF
# Verify each pod is on a different node
kubectl get pods -l app=ha-web -o wide
# Cleanup
kubectl delete deployment ha-web
05. Pod Affinity to Co-Locate Services¶
Deploy a cache Pod and an app Pod that must be on the same node as the cache.
Scenario:¶
β¦ Your application benefits from sub-millisecond latency to the local cache. β¦ Pod Affinity ensures co-location on the same node.
Hint: Use podAffinity with topologyKey: kubernetes.io/hostname matching the cache Pod’s labels.
Solution
# 1. Deploy the cache
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: cache
labels:
app: cache
spec:
containers:
- name: redis
image: redis:7-alpine
EOF
# 2. Deploy the app with affinity to cache
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: app-near-cache
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: cache
topologyKey: kubernetes.io/hostname
containers:
- name: app
image: nginx:1.25
EOF
# 3. Verify both are on the same node
kubectl get pods cache app-near-cache -o wide
# Cleanup
kubectl delete pod cache app-near-cache
06. Taint a Node and Add a Toleration¶
Taint a node with NoSchedule, verify a regular Pod is rejected, then deploy a Pod with a matching toleration.
Scenario:¶
β¦ You have dedicated GPU nodes that should only accept GPU workloads. β¦ Taints repel Pods; tolerations opt-in specific Pods.
Hint: kubectl taint nodes, then add a tolerations block to the Pod spec.
Solution
# 1. Taint a node
kubectl taint nodes <node-name> dedicated=gpu:NoSchedule
# 2. Try a regular Pod (will stay Pending if this is the only node)
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: regular-pod
spec:
containers:
- name: app
image: nginx:1.25
EOF
kubectl describe pod regular-pod | grep -A5 "Events:"
# 3. Deploy a Pod with toleration
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
containers:
- name: app
image: nginx:1.25
EOF
kubectl get pod gpu-pod -o wide
# Cleanup
kubectl delete pod regular-pod gpu-pod
kubectl taint nodes <node-name> dedicated=gpu:NoSchedule-
07. NoExecute Taint with tolerationSeconds¶
Deploy a Pod with a NoExecute toleration and tolerationSeconds: 60. Apply the taint and observe the Pod being evicted after 60 seconds.
Scenario:¶
β¦ During planned maintenance, you want to give running Pods a grace period before eviction.
β¦ tolerationSeconds controls how long a Pod survives after the taint is applied.
Hint: Use effect: NoExecute with tolerationSeconds in the Pod toleration.
Solution
# 1. Deploy a Pod with tolerationSeconds
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: graceful-pod
spec:
tolerations:
- key: "maintenance"
operator: "Equal"
value: "true"
effect: "NoExecute"
tolerationSeconds: 60
containers:
- name: app
image: nginx:1.25
EOF
# 2. Verify it's running
kubectl get pod graceful-pod -o wide
NODE=$(kubectl get pod graceful-pod -o jsonpath='{.spec.nodeName}')
# 3. Taint the node with NoExecute
kubectl taint nodes $NODE maintenance=true:NoExecute
# 4. Watch the Pod - it survives ~60s then is evicted
kubectl get pod graceful-pod -w
# Cleanup
kubectl taint nodes $NODE maintenance=true:NoExecute-
08. Topology Spread Constraints¶
Deploy 6 replicas of a Deployment with maxSkew: 1 across availability zones.
Scenario:¶
β¦ You need even distribution of pods across zones for resilience. β¦ Topology Spread Constraints provide finer control than Anti-Affinity.
Hint: Use topologySpreadConstraints with topologyKey: topology.kubernetes.io/zone.
Solution
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: zone-spread
spec:
replicas: 6
selector:
matchLabels:
app: zone-spread
template:
metadata:
labels:
app: zone-spread
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: zone-spread
containers:
- name: app
image: nginx:1.25
EOF
# Verify distribution
kubectl get pods -l app=zone-spread -o wide
# Cleanup
kubectl delete deployment zone-spread
09. Combine Node Affinity with Taints¶
Create a dedicated node pool pattern: taint the node (repel others) and use Node Affinity (attract your Pods).
Scenario:¶
β¦ You want to isolate monitoring workloads on dedicated nodes. β¦ The pattern is: Taint (repel) + Affinity (attract) + Toleration (allow).
Hint: Label and taint the node, then create a Pod with both nodeAffinity and tolerations.
Solution
# 1. Setup the dedicated node
kubectl label nodes <node-name> role=monitoring
kubectl taint nodes <node-name> role=monitoring:NoSchedule
# 2. Deploy a monitoring Pod
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: monitor-pod
spec:
tolerations:
- key: "role"
operator: "Equal"
value: "monitoring"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values: [monitoring]
containers:
- name: prometheus
image: prom/prometheus:latest
EOF
# 3. Verify it landed on the correct node
kubectl get pod monitor-pod -o wide
# Cleanup
kubectl delete pod monitor-pod
kubectl taint nodes <node-name> role=monitoring:NoSchedule-
kubectl label nodes <node-name> role-
10. Debug a Pending Pod¶
Given a Pod stuck in Pending, use kubectl commands to identify and resolve the scheduling failure.
Scenario:¶
β¦ A colleague deployed a Pod that’s stuck in Pending. You need to diagnose the issue. β¦ Common causes: missing labels, unmatched taints, insufficient resources.
Hint: kubectl describe pod, kubectl get events, check node labels and taints.
Solution
# 1. Create a Pod with an impossible affinity (to simulate the issue)
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: stuck-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nonexistent-label
operator: In
values: [does-not-exist]
containers:
- name: app
image: nginx:1.25
EOF
# 2. Check pod status
kubectl get pod stuck-pod
# STATUS: Pending
# 3. Describe for scheduling events
kubectl describe pod stuck-pod | grep -A10 "Events:"
# "0/N nodes are available: N node(s) didn't match Pod's node affinity/selector"
# 4. Check node labels to understand what's available
kubectl get nodes --show-labels
# 5. Check node taints
kubectl describe nodes | grep -A3 "Taints:"
# 6. Fix: either label a node or remove the affinity constraint
# Option A: Label a node to satisfy the affinity
kubectl label nodes <node-name> nonexistent-label=does-not-exist
# 7. Verify the Pod is now scheduled
kubectl get pod stuck-pod -o wide
# Cleanup
kubectl delete pod stuck-pod
kubectl label nodes <node-name> nonexistent-label-
Kubernetes Kubebuilder Tasks¶
- Hands-on Kubernetes exercises covering Kubebuilder operator development, CRD creation, reconciliation loops, webhooks, and testing.
- Each task includes a description, scenario, and a detailed solution with step-by-step instructions.
- Practice these tasks to master building production-grade Kubernetes operators.
Table of Contents¶
- 01. Initialize a Kubebuilder Project
- 02. Create a CRD API and Controller
- 03. Define CRD Types with Validation Markers
- 04. Generate and Install CRDs
- 05. Implement a Basic Reconciler
- 06. Run the Controller Locally
- 07. Add Owner References for Garbage Collection
- 08. Update Status Subresource
- 09. Add a Finalizer
- 10. Write a Controller Test with envtest
01. Initialize a Kubebuilder Project¶
Scaffold a new operator project using kubebuilder init and explore the generated files.
Scenario:¶
β¦ You’re starting a new operator project and need the project skeleton.
β¦ kubebuilder init creates the Makefile, Go module, and base Kustomize configs.
Hint: kubebuilder init --domain <domain> --repo <module>
Solution
# 1. Create and enter project directory
mkdir my-operator && cd my-operator
# 2. Initialize the project
kubebuilder init \
--domain example.com \
--repo example.com/my-operator
# 3. Explore generated files
ls -la
cat go.mod
cat Makefile | head -30
cat cmd/main.go | head -20
# 4. View available Make targets
make help
# Cleanup (if needed)
cd .. && rm -rf my-operator
02. Create a CRD API and Controller¶
Use kubebuilder create api to scaffold a new CRD type and its controller.
Scenario:¶
β¦ You need a custom resource called MyApp in the apps group.
β¦ Kubebuilder scaffolds both the Go type and the controller stub.
Hint: kubebuilder create api --group apps --version v1 --kind MyApp
Solution
# 1. Create the API (answer y to both prompts)
kubebuilder create api \
--group apps \
--version v1 \
--kind MyApp
# 2. Inspect the generated type
cat api/v1/myapp_types.go
# 3. Inspect the generated controller
cat internal/controller/myapp_controller.go
# 4. Check that main.go was updated
grep MyApp cmd/main.go
03. Define CRD Types with Validation Markers¶
Add fields to the CRD spec with Kubebuilder validation markers for min/max, enums, and defaults.
Scenario:¶
β¦ Your CRD needs a replicas field (1β10, default 1) and a tier field (enum: basic/premium).
β¦ Markers auto-generate OpenAPI v3 validation in the CRD YAML.
Hint: Use //+kubebuilder:validation:Minimum=1, //+kubebuilder:default=1, //+kubebuilder:validation:Enum=basic;premium.
Solution
// Edit api/v1/myapp_types.go - replace MyAppSpec:
type MyAppSpec struct {
// Replicas is the desired number of pods.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=10
// +kubebuilder:default=1
Replicas int32 `json:"replicas,omitempty"`
// Tier is the service tier.
// +kubebuilder:validation:Enum=basic;premium
// +kubebuilder:default=basic
Tier string `json:"tier,omitempty"`
// Message is displayed by the application.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=200
Message string `json:"message"`
}
04. Generate and Install CRDs¶
Run make manifests, make install, and verify the CRD is registered in the cluster.
Scenario:¶
β¦ After defining your types, you need to generate the CRD YAML and apply it to the cluster.
β¦ This makes kubectl get myapps work.
Hint: make generate && make manifests && make install
Solution
# 1. Generate deepcopy + CRD + RBAC
make generate
make manifests
# 2. Install CRDs into the cluster
make install
# 3. Verify the CRD exists
kubectl get crds | grep example.com
kubectl describe crd myapps.apps.example.com
# 4. Test that the API resource is available
kubectl get myapps
# "No resources found in default namespace."
# 5. Check the short name (if configured)
kubectl api-resources --api-group=apps.example.com
05. Implement a Basic Reconciler¶
Write a reconciler that creates a Deployment when a CR is created.
Scenario:¶
β¦ When a user creates a MyApp CR, your controller should create a corresponding Deployment.
β¦ The reconciler fetches the CR, checks if a Deployment exists, and creates it if missing.
Hint: Use r.Get() to fetch, errors.IsNotFound() to check, r.Create() to create.
Solution
// In internal/controller/myapp_controller.go - Reconcile method:
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// Fetch the CR
myapp := &appsv1.MyApp{}
if err := r.Get(ctx, req.NamespacedName, myapp); err != nil {
if errors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
// Check if Deployment exists
dep := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{
Name: myapp.Name, Namespace: myapp.Namespace,
}, dep)
if errors.IsNotFound(err) {
logger.Info("Creating Deployment", "name", myapp.Name)
// Build the Deployment
dep = buildDeployment(myapp)
ctrl.SetControllerReference(myapp, dep, r.Scheme)
return ctrl.Result{}, r.Create(ctx, dep)
}
return ctrl.Result{}, err
}
06. Run the Controller Locally¶
Use make run to run the operator on your machine against the cluster.
Scenario:¶
β¦ During development, you run the controller locally using your kubeconfig. β¦ This is faster than building a Docker image for every change.
Hint: make install && make run
Solution
# 1. Ensure CRDs are installed
make install
# 2. Run the controller
make run
# INFO Starting manager
# INFO Starting Controller {"controller": "myapp"}
# 3. In another terminal, create and delete CRs to test
kubectl apply -f config/samples/apps_v1_myapp.yaml
kubectl get myapps
kubectl delete myapp my-myapp
# 4. Stop the controller with Ctrl+C
07. Add Owner References for Garbage Collection¶
Set owner references on child resources so they are automatically deleted when the parent CR is deleted.
Scenario:¶
β¦ When a user deletes a MyApp CR, the Deployment, Service, and ConfigMap should be cleaned up.
β¦ Owner references enable Kubernetes garbage collection.
Hint: Use ctrl.SetControllerReference(parent, child, r.Scheme) before creating the child.
Solution
// Before r.Create(ctx, deployment):
if err := ctrl.SetControllerReference(myapp, deployment, r.Scheme); err != nil {
return ctrl.Result{}, err
}
# Test: create a CR, verify child resources exist
kubectl apply -f config/samples/apps_v1_myapp.yaml
kubectl get deployment -l app.kubernetes.io/managed-by=my-operator
# Verify owner reference is set
kubectl get deployment <name> -o jsonpath='{.metadata.ownerReferences}' | jq
# Delete the CR - children should be garbage-collected
kubectl delete myapp my-myapp
kubectl get deployments # Should be gone
08. Update Status Subresource¶
Update the CR’s .status fields to reflect the current state of managed resources.
Scenario:¶
β¦ Users need to see the current state (e.g., available replicas, phase) via kubectl get myapps.
β¦ Status updates use the /status subresource to avoid triggering spec watches.
Hint: Use r.Status().Update(ctx, updated) after computing the status.
Solution
09. Add a Finalizer¶
Implement a finalizer that runs custom cleanup logic before the CR is deleted.
Scenario:¶
β¦ Your operator manages external resources (e.g., DNS records, cloud storage) that need cleanup. β¦ Finalizers prevent deletion until cleanup is done.
Hint: Use controllerutil.AddFinalizer/RemoveFinalizer, check DeletionTimestamp.IsZero().
Solution
const myFinalizer = "apps.example.com/finalizer"
// In Reconcile(), after fetching the CR:
if myapp.DeletionTimestamp.IsZero() {
if !controllerutil.ContainsFinalizer(myapp, myFinalizer) {
controllerutil.AddFinalizer(myapp, myFinalizer)
return ctrl.Result{}, r.Update(ctx, myapp)
}
} else {
if controllerutil.ContainsFinalizer(myapp, myFinalizer) {
logger.Info("Running cleanup for", "name", myapp.Name)
// Do external cleanup here...
controllerutil.RemoveFinalizer(myapp, myFinalizer)
return ctrl.Result{}, r.Update(ctx, myapp)
}
return ctrl.Result{}, nil
}
10. Write a Controller Test with envtest¶
Write a Ginkgo/Gomega integration test that verifies your controller creates a Deployment.
Scenario:¶
β¦ You need automated tests for your operator that run without a real cluster.
β¦ envtest starts a local API server and etcd for testing.
Hint: Use k8sClient.Create() to create a CR, then Eventually() to wait for the Deployment.
Solution
// internal/controller/myapp_controller_test.go
var _ = Describe("MyApp Controller", func() {
ctx := context.Background()
It("should create a Deployment when a MyApp is created", func() {
myapp := &v1.MyApp{
ObjectMeta: metav1.ObjectMeta{
Name: "test-app",
Namespace: "default",
},
Spec: v1.MyAppSpec{
Replicas: 2,
Message: "test",
},
}
Expect(k8sClient.Create(ctx, myapp)).To(Succeed())
deployment := &appsv1.Deployment{}
Eventually(func() error {
return k8sClient.Get(ctx, types.NamespacedName{
Name: "test-app",
Namespace: "default",
}, deployment)
}, time.Second*30, time.Millisecond*250).Should(Succeed())
Expect(*deployment.Spec.Replicas).To(Equal(int32(2)))
})
})
Kubernetes KEDA Tasks¶
- Hands-on Kubernetes exercises covering KEDA (Kubernetes Event-Driven Autoscaling) installation, ScaledObjects, ScaledJobs, TriggerAuthentication, and real-world autoscaling patterns.
- Each task includes a description, scenario, and a detailed solution with step-by-step instructions.
- Practice these tasks to master event-driven autoscaling with KEDA.
Table of Contents¶
- 01. Install KEDA via Helm
- 02. Create a CPU-Based ScaledObject
- 03. Scale to Zero with Redis Queue
- 04. Schedule Scaling with the Cron Trigger
- 05. Use TriggerAuthentication with Secrets
- 06. Combine Multiple Triggers
- 07. Create a ScaledJob for Batch Processing
- 08. Tune Scale-Up and Scale-Down Behavior
- 09. Pause and Resume a ScaledObject
- 10. Troubleshoot a Non-Scaling ScaledObject
01. Install KEDA via Helm¶
Install KEDA on a Kubernetes cluster using the official Helm chart and verify all components.
Scenario:¶
β¦ Your team wants to adopt event-driven autoscaling for queue-based workers. β¦ KEDA extends the native HPA with 60+ event source scalers.
Hint: helm repo add kedacore, helm upgrade --install keda
Solution
# 1. Add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update kedacore
# 2. Install KEDA
helm upgrade --install keda kedacore/keda \
--namespace keda \
--create-namespace \
--wait
# 3. Verify pods are running
kubectl get pods -n keda
# keda-admission-webhooks-xxxx 1/1 Running
# keda-operator-xxxx 1/1 Running
# keda-operator-metrics-apiserver 1/1 Running
# 4. Verify CRDs are registered
kubectl get crd | grep keda
# scaledobjects.keda.sh
# scaledjobs.keda.sh
# triggerauthentications.keda.sh
# clustertriggerauthentications.keda.sh
# 5. Verify the metrics API
kubectl get apiservice | grep keda
02. Create a CPU-Based ScaledObject¶
Create a ScaledObject that scales a Deployment based on CPU utilization (threshold: 60%).
Scenario:¶
β¦ You want to replace your existing HPA with KEDA to later add queue-based triggers. β¦ The CPU scaler works identically to HPA but can be combined with other KEDA scalers.
Hint: Use type: cpu with metadata.type: Utilization and metadata.value: "60".
Solution
# 1. Create a namespace and deployment
kubectl create namespace keda-tasks
kubectl create deployment nginx-demo \
--image=nginx:1.25 \
--replicas=1 \
--namespace=keda-tasks
kubectl set resources deployment nginx-demo \
--requests=cpu=50m,memory=64Mi \
--namespace=keda-tasks
# 2. Apply the ScaledObject
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cpu-scaler
namespace: keda-tasks
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: cpu
metadata:
type: Utilization
value: "60"
EOF
# 3. Verify KEDA created an HPA
kubectl get hpa -n keda-tasks
kubectl get scaledobject -n keda-tasks
# Cleanup
kubectl delete namespace keda-tasks
03. Scale to Zero with Redis Queue¶
Deploy a Redis-backed worker that scales from 0 to N based on queue depth, and back to 0 when empty.
Scenario:¶
β¦ Idle workers waste resources. You want pods only when there’s work. β¦ KEDA monitors the Redis list length and scales workers accordingly.
Hint: Set minReplicaCount: 0 and use the redis scaler with listName and listLength.
Solution
# 1. Create namespace and deploy Redis
kubectl create namespace keda-tasks
kubectl create deployment redis --image=redis:7-alpine -n keda-tasks
kubectl expose deployment redis --port=6379 -n keda-tasks
# 2. Create a worker deployment (starting at 0)
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: queue-worker
namespace: keda-tasks
spec:
replicas: 0
selector:
matchLabels:
app: queue-worker
template:
metadata:
labels:
app: queue-worker
spec:
containers:
- name: worker
image: redis:7-alpine
command: ["/bin/sh", "-c"]
args:
- |
while true; do
JOB=$(redis-cli -h redis LPOP work:queue)
if [ -n "$JOB" ]; then echo "Processing: $JOB"; sleep 2
else sleep 1; fi
done
EOF
# 3. Create the ScaledObject
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: queue-scaler
namespace: keda-tasks
spec:
scaleTargetRef:
name: queue-worker
minReplicaCount: 0
maxReplicaCount: 10
cooldownPeriod: 30
pollingInterval: 5
triggers:
- type: redis
metadata:
address: redis.keda-tasks.svc:6379
listName: work:queue
listLength: "5"
EOF
# 4. Verify 0 pods
kubectl get pods -n keda-tasks -l app=queue-worker
# 5. Push jobs and watch scale-up
kubectl exec deployment/redis -n keda-tasks -- \
redis-cli RPUSH work:queue j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 j11 j12 j13 j14 j15
kubectl get pods -n keda-tasks -l app=queue-worker -w
# Cleanup
kubectl delete namespace keda-tasks
04. Schedule Scaling with the Cron Trigger¶
Create a ScaledObject that scales to 5 replicas during business hours (MonβFri, 08:00β18:00).
Scenario:¶
β¦ Your API needs pre-warmed capacity every weekday morning. β¦ The Cron scaler provides time-based replica scheduling.
Hint: Use type: cron with start, end, timezone, and desiredReplicas.
Solution
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cron-scaler
namespace: keda-tasks
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: cron
metadata:
timezone: "UTC"
start: "0 8 * * 1-5"
end: "0 18 * * 1-5"
desiredReplicas: "5"
EOF
# Check the ScaledObject
kubectl describe scaledobject cron-scaler -n keda-tasks
05. Use TriggerAuthentication with Secrets¶
Create a TriggerAuthentication backed by a Kubernetes Secret and reference it in a ScaledObject.
Scenario:¶
β¦ Your Redis requires authentication and you don’t want the password in the ScaledObject. β¦ TriggerAuthentication separates credentials from scaling configuration.
Hint: Create a Secret, create a TriggerAuthentication with secretTargetRef, then use authenticationRef in the ScaledObject.
Solution
# 1. Create the Secret
kubectl create secret generic redis-creds \
--namespace keda-tasks \
--from-literal=password='s3cret'
# 2. Create TriggerAuthentication
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: redis-auth
namespace: keda-tasks
spec:
secretTargetRef:
- parameter: password
name: redis-creds
key: password
EOF
# 3. Reference it in a ScaledObject
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: auth-scaler
namespace: keda-tasks
spec:
scaleTargetRef:
name: queue-worker
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: redis
authenticationRef:
name: redis-auth
metadata:
address: redis:6379
listName: secure:queue
listLength: "5"
EOF
# 4. Verify
kubectl get triggerauthentication -n keda-tasks
kubectl describe scaledobject auth-scaler -n keda-tasks
06. Combine Multiple Triggers¶
Create a ScaledObject with both a Cron trigger and a CPU trigger in a single resource.
Scenario:¶
β¦ You need a baseline of 3 pods during work hours, but CPU-driven bursting beyond that. β¦ KEDA evaluates all triggers and uses the maximum demanded replicas.
Hint: Add multiple entries in the triggers list.
Solution
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: multi-trigger
namespace: keda-tasks
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 15
triggers:
- type: cron
metadata:
timezone: "UTC"
start: "0 8 * * 1-5"
end: "0 18 * * 1-5"
desiredReplicas: "3"
- type: cpu
metadata:
type: Utilization
value: "60"
EOF
# KEDA uses whichever trigger demands MORE replicas
kubectl get hpa -n keda-tasks
kubectl describe scaledobject multi-trigger -n keda-tasks
07. Create a ScaledJob for Batch Processing¶
Create a ScaledJob that spawns one Job per batch of 5 items in a Redis list.
Scenario:¶
β¦ Each batch task (e.g., video transcoding, report generation) runs as a short-lived Job. β¦ ScaledJob creates new Jobs (not replica scaling) - one per event batch.
Hint: Use kind: ScaledJob with jobTargetRef instead of scaleTargetRef.
Solution
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: batch-job
namespace: keda-tasks
spec:
jobTargetRef:
parallelism: 1
completions: 1
backoffLimit: 2
template:
spec:
restartPolicy: Never
containers:
- name: processor
image: redis:7-alpine
command: ["/bin/sh", "-c"]
args:
- |
for i in $(seq 1 5); do
JOB=$(redis-cli -h redis LPOP batch:queue)
[ -n "$JOB" ] && echo "Processing: $JOB" && sleep 1
done
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 10
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 3
triggers:
- type: redis
metadata:
address: redis:6379
listName: batch:queue
listLength: "5"
EOF
# Push items
kubectl exec deployment/redis -n keda-tasks -- \
redis-cli RPUSH batch:queue b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
# Watch Jobs
kubectl get jobs -n keda-tasks -w
kubectl get scaledjob -n keda-tasks
08. Tune Scale-Up and Scale-Down Behavior¶
Configure a ScaledObject with custom HPA behavior: fast scale-up, slow scale-down with a 2-minute stabilization window.
Scenario:¶
β¦ Your service is latency-sensitive - scale up fast, but avoid flapping by scaling down slowly.
β¦ KEDA supports the same behavior config as native HPA.
Hint: Use spec.advanced.horizontalPodAutoscalerConfig.behavior.
Solution
cat <<'EOF' | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: tuned-scaler
namespace: keda-tasks
spec:
scaleTargetRef:
name: nginx-demo
minReplicaCount: 1
maxReplicaCount: 20
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 4
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 1
periodSeconds: 60
triggers:
- type: cpu
metadata:
type: Utilization
value: "60"
EOF
kubectl describe hpa -n keda-tasks
09. Pause and Resume a ScaledObject¶
Temporarily pause KEDA scaling at a fixed replica count, then resume.
Scenario:¶
β¦ You’re performing maintenance on the metric source (e.g., Redis migration). β¦ You need to freeze replicas at the current count without deleting the ScaledObject.
Hint: Use the autoscaling.keda.sh/paused-replicas annotation.
Solution
# 1. Pause at 3 replicas
kubectl annotate scaledobject cpu-scaler \
-n keda-tasks \
autoscaling.keda.sh/paused-replicas="3"
# 2. Verify paused
kubectl get scaledobject cpu-scaler -n keda-tasks -o yaml | grep -A2 annotations
kubectl get deployment nginx-demo -n keda-tasks
# 3. Resume
kubectl annotate scaledobject cpu-scaler \
-n keda-tasks \
autoscaling.keda.sh/paused-replicas-
# 4. Verify resumed
kubectl describe scaledobject cpu-scaler -n keda-tasks
10. Troubleshoot a Non-Scaling ScaledObject¶
Diagnose why a ScaledObject isn’t scaling and fix the issue.
Scenario:¶
β¦ A ScaledObject was applied but the Deployment stays at its initial replica count. β¦ You need to check status conditions, KEDA operator logs, and the managed HPA.
Hint: kubectl describe scaledobject, kubectl logs -n keda, kubectl get hpa.
Solution
# 1. Check ScaledObject status
kubectl describe scaledobject <name> -n <namespace>
# Look for:
# Ready: True/False
# Active: True/False
# External Metric Names
# 2. Check the KEDA-managed HPA
kubectl get hpa -n <namespace>
kubectl describe hpa keda-hpa-<name> -n <namespace>
# 3. Check KEDA operator logs for errors
kubectl logs -n keda -l app=keda-operator --tail=100
# 4. Common issues:
# - Wrong address/host for the scaler β fix metadata.address
# - Missing TriggerAuthentication β create one or fix the reference
# - ScaledObject targeting wrong Deployment name β fix scaleTargetRef.name
# - CRD validation error β check Events section
# 5. Verify metric source connectivity
kubectl run debug --rm -it --image=busybox -n <namespace> --restart=Never \
-- sh -c "nc -zv redis.keda-tasks.svc 6379"
# 6. Check if metrics are being exposed
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq '.resources[].name'
Kubernetes Harbor + ArgoCD Airgap Tasks¶
- Hands-on Kubernetes exercises covering Harbor registry installation, Nginx Ingress setup, fully offline/airgap ArgoCD deployment, Helm chart creation, and GitOps application delivery.
- Each task includes a description, scenario, explanation, and a detailed solution with step-by-step instructions and scripts.
- Practice these tasks to master end-to-end airgapped GitOps workflows using Harbor as a private container registry and ArgoCD as the deployment engine.
Table of Contents¶
- 01. Install Nginx Ingress Controller + Harbor Registry
- 02. Configure Harbor with Ingress (harbor.local)
- 03. Prepare ArgoCD for Full Offline/Airgap Install
- 04. Create a Git Repository with a Helm Chart
- 05. Deploy ArgoCD (Offline Install Using Harbor)
- 06. Create an ArgoCD Application to Deploy the Helm Chart
- Full Install Script (All Steps)
Architecture Overview¶
graph LR
subgraph Airgap GitOps Architecture
subgraph Online["Online Machine"]
OPS["docker pull\ndocker tag\ndocker push"]
end
subgraph Harbor["Harbor Registry\nharbor.local"]
PROJECTS["Projects:\n- argocd\n- library\n- helm-charts"]
end
subgraph K8s["Kubernetes Cluster"]
ARGOCD["ArgoCD (airgap)\nAll images from\nHarbor registry"]
APP["Application\n(Helm Chart)\nfrom Git repo"]
ARGOCD --> APP
end
subgraph Ingress["Nginx Ingress Controller"]
PORT["harbor.local:80"]
end
subgraph Git["Git Repository\n(local/remote)"]
CONTENTS["Contains:\n- Helm chart (my-app/)\n- ArgoCD Application manifest"]
end
Online -- "pull/push" --> Harbor
K8s -- "pull" --> Harbor
Ingress --- Harbor
end
01. Install Nginx Ingress Controller + Harbor Registry¶
Install the Nginx Ingress Controller and Harbor container registry on a Kubernetes cluster from the internet.
Scenario:¶
- You are setting up a private container registry environment.
- Harbor will serve as the local OCI registry for all container images and Helm charts.
- The Nginx Ingress Controller is required to expose Harbor (and later ArgoCD) via hostnames.
Explanation:¶
- Nginx Ingress Controller acts as a reverse proxy that routes HTTP/HTTPS traffic to services inside the cluster based on hostname and path rules.
- Harbor is an open-source container registry that supports image management, vulnerability scanning, RBAC, and Helm chart hosting. It is the backbone of an airgap deployment - all images are pre-loaded here.
- We install both from the internet first, then use Harbor to serve all images for the offline ArgoCD install.
Prerequisites: A running Kubernetes cluster (Kind, Minikube, or cloud-based), helm, kubectl, docker installed.
Hint: helm repo add, helm upgrade --install, kubectl get pods
Solution
#!/bin/bash
# =============================================================================
# Step 01 - Install Nginx Ingress Controller + Harbor Registry
# =============================================================================
set -e
# ββ Color definitions ββ
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
header() { echo -e "\n${CYAN}=== $* ===${NC}"; }
# ββ 1. Add Helm repositories ββ
header "Adding Helm Repositories"
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add harbor https://helm.goharbor.io
helm repo update
success "Helm repositories added and updated"
# ββ 2. Install Nginx Ingress Controller ββ
header "Installing Nginx Ingress Controller"
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=NodePort \
--set controller.service.nodePorts.http=30080 \
--set controller.service.nodePorts.https=30443 \
--set controller.admissionWebhooks.enabled=false \
--wait --timeout 5m
# Verify Ingress Controller pods
kubectl get pods -n ingress-nginx
success "Nginx Ingress Controller installed"
# ββ 3. Get the Ingress IP/Node IP ββ
header "Detecting Cluster Node IP"
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
info "Node IP: ${NODE_IP}"
# ββ 4. Install Harbor ββ
header "Installing Harbor Registry"
helm upgrade --install harbor harbor/harbor \
--namespace harbor \
--create-namespace \
--set expose.type=ingress \
--set expose.ingress.className=nginx \
--set expose.ingress.hosts.core=harbor.local \
--set expose.tls.enabled=false \
--set externalURL=http://harbor.local \
--set harborAdminPassword=Harbor12345 \
--set persistence.enabled=false \
--wait --timeout 10m
# Verify Harbor pods
kubectl get pods -n harbor
success "Harbor registry installed"
# ββ 5. Add harbor.local to /etc/hosts ββ
header "Configuring /etc/hosts"
if ! grep -q "harbor.local" /etc/hosts; then
echo "${NODE_IP} harbor.local" | sudo tee -a /etc/hosts
success "Added harbor.local to /etc/hosts"
else
warn "harbor.local already exists in /etc/hosts"
fi
# ββ 6. Verify Harbor is accessible ββ
header "Verifying Harbor Access"
# Wait for Ingress to be ready
sleep 10
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://harbor.local/api/v2.0/health 2>/dev/null || echo "000")
if [ "${HTTP_CODE}" = "200" ]; then
success "Harbor is healthy (HTTP ${HTTP_CODE})"
else
warn "Harbor returned HTTP ${HTTP_CODE} - it may still be starting up"
info "Try: curl http://harbor.local/api/v2.0/health"
fi
echo ""
info "Harbor UI: http://harbor.local"
info "Harbor Admin: admin / Harbor12345"
info "Ingress Node IP: ${NODE_IP}"
success "Step 01 complete!"
Key Concepts:¶
| Component | Purpose |
|---|---|
| Nginx Ingress Controller | Routes HTTP traffic to services based on hostname/path rules |
| Harbor | Private container registry + Helm chart repository |
| NodePort | Exposes Ingress on fixed ports (30080/30443) on each node |
expose.type=ingress |
Tells Harbor to create Ingress resources for external access |
persistence.enabled=false |
Uses emptyDir for lab purposes (data lost on pod restart) |
02. Configure Harbor with Ingress (harbor.local)¶
Verify Harbor is accessible via harbor.local, create projects, and configure Docker to trust the insecure registry.
Scenario:¶
- Harbor is installed but you need to verify the Ingress route works correctly.
- You need to create Harbor projects to organize images for the airgap deployment.
- Docker must be configured to allow pushing/pulling from the insecure (HTTP) registry.
Explanation:¶
- Harbor Projects are logical groupings for container images (similar to Docker Hub organizations).
- We create an
argocdproject to hold all ArgoCD-related images and alibraryproject for general images. - Since we use HTTP (not HTTPS), Docker needs
harbor.localadded to its insecure registries list.
Hint: curl, Harbor API, Docker daemon.json, docker login
Solution
#!/bin/bash
# =============================================================================
# Step 02 - Configure Harbor with Ingress (harbor.local)
# =============================================================================
set -e
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
BLUE='\033[0;34m'; CYAN='\033[0;36m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
header() { echo -e "\n${CYAN}=== $* ===${NC}"; }
HARBOR_URL="http://harbor.local"
HARBOR_USER="admin"
HARBOR_PASS="Harbor12345"
# ββ 1. Verify Harbor health ββ
header "Verifying Harbor Health"
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" ${HARBOR_URL}/api/v2.0/health)
if [ "${HTTP_CODE}" != "200" ]; then
error "Harbor is not healthy (HTTP ${HTTP_CODE}). Check pods: kubectl get pods -n harbor"
fi
success "Harbor is healthy"
# ββ 2. Verify Ingress routing ββ
header "Verifying Ingress Configuration"
kubectl get ingress -n harbor
info "Harbor Ingress rules:"
kubectl describe ingress -n harbor | grep -E "Host|Path|Backend"
# ββ 3. Create Harbor projects via API ββ
header "Creating Harbor Projects"
create_project() {
local project_name=$1
local response
response=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST "${HARBOR_URL}/api/v2.0/projects" \
-H "Content-Type: application/json" \
-u "${HARBOR_USER}:${HARBOR_PASS}" \
-d "{\"project_name\": \"${project_name}\", \"public\": true}")
if [ "${response}" = "201" ]; then
success "Created project: ${project_name}"
elif [ "${response}" = "409" ]; then
warn "Project already exists: ${project_name}"
else
error "Failed to create project ${project_name} (HTTP ${response})"
fi
}
create_project "argocd"
create_project "library"
# ββ 4. Verify projects ββ
header "Listing Harbor Projects"
curl -s -u "${HARBOR_USER}:${HARBOR_PASS}" \
"${HARBOR_URL}/api/v2.0/projects" | \
python3 -m json.tool 2>/dev/null | grep -E '"name"|"project_id"' || \
curl -s -u "${HARBOR_USER}:${HARBOR_PASS}" \
"${HARBOR_URL}/api/v2.0/projects" | grep -o '"name":"[^"]*"'
# ββ 5. Configure Docker for insecure registry ββ
header "Configuring Docker for Insecure Registry"
DOCKER_DAEMON="/etc/docker/daemon.json"
info "Docker must trust harbor.local as an insecure registry (HTTP)."
info ""
info "Add the following to ${DOCKER_DAEMON}:"
echo ""
echo ' {
"insecure-registries": ["harbor.local"]
}'
echo ""
# Attempt to configure automatically (requires sudo)
if [ -f "${DOCKER_DAEMON}" ]; then
if grep -q "harbor.local" "${DOCKER_DAEMON}"; then
success "harbor.local already in insecure-registries"
else
warn "Please add harbor.local to insecure-registries manually"
info "Then restart Docker: sudo systemctl restart docker"
fi
else
info "Creating ${DOCKER_DAEMON} with insecure registry config"
sudo mkdir -p /etc/docker
echo '{"insecure-registries": ["harbor.local"]}' | sudo tee "${DOCKER_DAEMON}"
sudo systemctl restart docker 2>/dev/null || warn "Restart Docker manually"
success "Docker configured"
fi
# ββ 6. Docker login to Harbor ββ
header "Logging in to Harbor"
docker login harbor.local \
-u "${HARBOR_USER}" \
-p "${HARBOR_PASS}" && \
success "Docker login successful" || \
warn "Docker login failed - ensure Docker is configured for insecure registries"
# ββ 7. Test push/pull ββ
header "Testing Push/Pull"
docker pull busybox:latest
docker tag busybox:latest harbor.local/library/busybox:latest
docker push harbor.local/library/busybox:latest && \
success "Test push successful!" || \
warn "Test push failed - check Docker insecure registry config"
echo ""
info "Harbor URL: ${HARBOR_URL}"
info "Projects: argocd, library"
info "Credentials: ${HARBOR_USER} / ${HARBOR_PASS}"
success "Step 02 complete!"
Ingress Verification:¶
# Check the Ingress resource created by Harbor
kubectl get ingress -n harbor -o wide
# Expected output:
# NAME CLASS HOSTS ADDRESS PORTS AGE
# harbor-ingress nginx harbor.local 10.x.x.x 80 5m
# Test with curl
curl http://harbor.local/api/v2.0/systeminfo
# Returns Harbor version and system information
03. Prepare ArgoCD for Full Offline/Airgap Install¶
Identify, pull, tag, and push all required ArgoCD container images to the Harbor registry for a fully airgapped installation.
Scenario:¶
- Your production cluster has no internet access (airgap environment).
- All container images must be pre-loaded into the private Harbor registry before deploying ArgoCD.
- You need to identify every image the ArgoCD Helm chart will use and mirror them to Harbor.
Explanation:¶
- An airgap installation means no external network access - every container image must already exist in a local registry.
- The ArgoCD Helm chart deploys multiple components (server, controller, repo-server, redis, dex, notifications), each with its own container image.
- We use
helm templateto render all manifests and extract image references, then mirror them to Harbor. - The ArgoCD Helm chart version and image tags are tightly coupled - always use matching versions.
Hint: helm template, grep image:, docker pull/tag/push, Harbor API
Solution
Complete List of ArgoCD Images (v2.13)¶
The following images are required for a full ArgoCD offline installation:
| Component | Image |
|---|---|
| ArgoCD Server | quay.io/argoproj/argocd:v2.13.3 |
| ArgoCD Application Controller | quay.io/argoproj/argocd:v2.13.3 |
| ArgoCD Repo Server | quay.io/argoproj/argocd:v2.13.3 |
| ArgoCD Notifications | quay.io/argoproj/argocd:v2.13.3 |
| ArgoCD ApplicationSet | quay.io/argoproj/argocd:v2.13.3 |
| Redis (HA Cache) | redis:7.4.2-alpine |
| Dex (OIDC Provider) | ghcr.io/dexidp/dex:v2.41.1 |
Note: ArgoCD uses a single image (
quay.io/argoproj/argocd) for multiple components - the entrypoint command differs per component. The exact image tags may vary based on the Helm chart version. Always verify withhelm template.
#!/bin/bash
# =============================================================================
# Step 03 - Prepare ArgoCD for Full Offline/Airgap Install
# =============================================================================
set -e
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
BLUE='\033[0;34m'; CYAN='\033[0;36m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
header() { echo -e "\n${CYAN}=== $* ===${NC}"; }
HARBOR_URL="harbor.local"
HARBOR_USER="admin"
HARBOR_PASS="Harbor12345"
ARGOCD_CHART_VERSION="7.7.12"
# ββ 1. Add ArgoCD Helm repository ββ
header "Adding ArgoCD Helm Repository"
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update argo
success "ArgoCD Helm repo added"
# ββ 2. Discover all required images using helm template ββ
header "Discovering Required Images"
info "Rendering ArgoCD Helm chart to extract image references..."
IMAGES=$(helm template argocd argo/argo-cd \
--version "${ARGOCD_CHART_VERSION}" \
--namespace argocd 2>/dev/null | \
grep -E "image:" | \
sed 's/.*image: *"\?\([^"]*\)"\?.*/\1/' | \
sort -u)
echo ""
info "Required images for ArgoCD ${ARGOCD_CHART_VERSION}:"
echo "βββββββββββββββββββββββββββββββββββββββββββββββββ"
echo "${IMAGES}" | while read -r img; do
echo " ${img}"
done
echo "βββββββββββββββββββββββββββββββββββββββββββββββββ"
# ββ 3. Define the image mapping (source β Harbor target) ββ
header "Building Image Mirror Map"
# ArgoCD images - all use the same base image with different entrypoints
declare -A IMAGE_MAP
while IFS= read -r img; do
# Extract the image name and tag
# Convert quay.io/argoproj/argocd:v2.13.3 β harbor.local/argocd/argocd:v2.13.3
# Convert redis:7.4.2-alpine β harbor.local/argocd/redis:7.4.2-alpine
# Convert ghcr.io/dexidp/dex:v2.41.1 β harbor.local/argocd/dex:v2.41.1
local_name=$(echo "${img}" | rev | cut -d'/' -f1 | rev) # e.g. argocd:v2.13.3
IMAGE_MAP["${img}"]="${HARBOR_URL}/argocd/${local_name}"
done <<< "${IMAGES}"
info "Image mirror mapping:"
for src in "${!IMAGE_MAP[@]}"; do
echo " ${src}"
echo " β ${IMAGE_MAP[$src]}"
done
# ββ 4. Pull all images from the internet ββ
header "Pulling Images from Internet"
for src in "${!IMAGE_MAP[@]}"; do
info "Pulling: ${src}"
docker pull "${src}" || error "Failed to pull ${src}"
success "Pulled: ${src}"
done
# ββ 5. Tag images for Harbor ββ
header "Tagging Images for Harbor"
for src in "${!IMAGE_MAP[@]}"; do
dst="${IMAGE_MAP[$src]}"
info "Tagging: ${src} β ${dst}"
docker tag "${src}" "${dst}"
success "Tagged: ${dst}"
done
# ββ 6. Push images to Harbor ββ
header "Pushing Images to Harbor"
docker login "${HARBOR_URL}" -u "${HARBOR_USER}" -p "${HARBOR_PASS}" || \
error "Docker login to Harbor failed"
for src in "${!IMAGE_MAP[@]}"; do
dst="${IMAGE_MAP[$src]}"
info "Pushing: ${dst}"
docker push "${dst}" || error "Failed to push ${dst}"
success "Pushed: ${dst}"
done
# ββ 7. Verify images in Harbor ββ
header "Verifying Images in Harbor"
info "Images in 'argocd' project:"
curl -s -u "${HARBOR_USER}:${HARBOR_PASS}" \
"http://${HARBOR_URL}/api/v2.0/projects/argocd/repositories" | \
python3 -m json.tool 2>/dev/null || \
curl -s -u "${HARBOR_USER}:${HARBOR_PASS}" \
"http://${HARBOR_URL}/api/v2.0/projects/argocd/repositories"
# ββ 8. Save the ArgoCD Helm chart locally for offline use ββ
header "Saving ArgoCD Helm Chart for Offline Use"
mkdir -p /tmp/argocd-airgap
helm pull argo/argo-cd \
--version "${ARGOCD_CHART_VERSION}" \
--destination /tmp/argocd-airgap/
ls -la /tmp/argocd-airgap/
success "Helm chart saved to /tmp/argocd-airgap/"
# ββ 9. Push Helm chart to Harbor OCI registry (optional) ββ
header "Pushing Helm Chart to Harbor OCI Registry"
helm push /tmp/argocd-airgap/argo-cd-${ARGOCD_CHART_VERSION}.tgz \
oci://${HARBOR_URL}/argocd 2>/dev/null && \
success "Helm chart pushed to Harbor OCI" || \
warn "OCI push skipped (Harbor may need OCI enabled or use chartmuseum)"
echo ""
echo "βββββββββββββββββββββββββββββββββββββββββββββββββ"
info "Summary of mirrored images:"
echo "βββββββββββββββββββββββββββββββββββββββββββββββββ"
for src in "${!IMAGE_MAP[@]}"; do
echo " ${IMAGE_MAP[$src]}"
done
echo "βββββββββββββββββββββββββββββββββββββββββββββββββ"
echo ""
success "Step 03 complete! All ArgoCD images are in Harbor."
Quick Reference: Image Discovery Commands¶
# Method 1: helm template (recommended - shows exact images)
helm template argocd argo/argo-cd --version 7.7.12 | grep "image:" | sort -u
# Method 2: helm show values (shows configurable image fields)
helm show values argo/argo-cd --version 7.7.12 | grep -A2 "repository:"
# Method 3: After install - check running pods
kubectl get pods -n argocd -o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}{end}{end}' | sort -u
# Verify images in Harbor via API
curl -s -u admin:Harbor12345 http://harbor.local/api/v2.0/projects/argocd/repositories | python3 -m json.tool
04. Create a Git Repository with a Helm Chart¶
Create a local Git repository containing a sample Helm chart that ArgoCD will deploy.
Scenario:¶
- ArgoCD follows the GitOps model - the Git repository is the single source of truth.
- You need a Helm chart in a Git repository that ArgoCD can monitor and deploy.
- The chart deploys a simple nginx-based web application with configurable replicas and a custom welcome page.
Explanation:¶
- GitOps means the desired state of the cluster is declared in Git. ArgoCD watches the repo and syncs changes automatically.
- The Helm chart contains templates for a Deployment, Service, and ConfigMap.
- We use a local bare Git repository for the lab (simulating a remote Git server). In production, this would be GitHub, GitLab, or Gitea.
- ArgoCD can detect Helm charts automatically by the presence of
Chart.yaml.
Hint: git init --bare, helm create, git push
Solution
#!/bin/bash
# =============================================================================
# Step 04 - Create a Git Repository with a Helm Chart
# =============================================================================
set -e
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
BLUE='\033[0;34m'; CYAN='\033[0;36m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
header() { echo -e "\n${CYAN}=== $* ===${NC}"; }
REPO_BASE="/tmp/gitops-lab"
BARE_REPO="${REPO_BASE}/helm-apps.git"
WORK_DIR="${REPO_BASE}/helm-apps-workspace"
CHART_NAME="my-web-app"
# ββ 1. Create a bare Git repository (simulates remote server) ββ
header "Creating Bare Git Repository"
rm -rf "${REPO_BASE}"
mkdir -p "${REPO_BASE}"
git init --bare "${BARE_REPO}"
success "Bare repo created at ${BARE_REPO}"
# ββ 2. Clone the bare repo into a working directory ββ
header "Cloning Working Directory"
git clone "${BARE_REPO}" "${WORK_DIR}"
cd "${WORK_DIR}"
success "Working directory: ${WORK_DIR}"
# ββ 3. Scaffold the Helm chart ββ
header "Creating Helm Chart: ${CHART_NAME}"
helm create "${CHART_NAME}"
# ββ 4. Customize Chart.yaml ββ
cat > "${CHART_NAME}/Chart.yaml" << 'EOF'
apiVersion: v2
name: my-web-app
description: A simple web application deployed via ArgoCD GitOps
type: application
version: 0.1.0
appVersion: "1.25.0"
maintainers:
- name: platform-team
email: platform@example.com
EOF
success "Chart.yaml customized"
# ββ 5. Customize values.yaml ββ
cat > "${CHART_NAME}/values.yaml" << 'EOF'
replicaCount: 2
image:
repository: nginx
pullPolicy: IfNotPresent
tag: "1.25-alpine"
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
automount: true
annotations: {}
name: ""
podAnnotations: {}
podLabels: {}
podSecurityContext: {}
securityContext: {}
service:
type: ClusterIP
port: 80
ingress:
enabled: false
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
autoscaling:
enabled: false
volumes: []
volumeMounts: []
nodeSelector: {}
tolerations: []
affinity: {}
# Custom welcome page
welcomePage:
title: "GitOps Demo App"
message: "Deployed by ArgoCD from Harbor airgap registry!"
backgroundColor: "#1a1a2e"
textColor: "#e94560"
EOF
success "values.yaml customized"
# ββ 6. Create a custom ConfigMap template for the welcome page ββ
cat > "${CHART_NAME}/templates/configmap.yaml" << 'TEMPLATE'
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "my-web-app.fullname" . }}-html
labels:
{{- include "my-web-app.labels" . | nindent 4 }}
data:
index.html: |
<!DOCTYPE html>
<html>
<head>
<title>{{ .Values.welcomePage.title }}</title>
<style>
body {
font-family: 'Segoe UI', Arial, sans-serif;
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
margin: 0;
background: {{ .Values.welcomePage.backgroundColor }};
color: {{ .Values.welcomePage.textColor }};
}
.container { text-align: center; }
h1 { font-size: 2.5em; margin-bottom: 0.5em; }
.info { font-size: 1.2em; margin: 8px 0; color: #eee; }
.badge {
display: inline-block;
background: {{ .Values.welcomePage.textColor }};
color: white;
padding: 5px 15px;
border-radius: 20px;
margin: 5px;
font-size: 0.9em;
}
</style>
</head>
<body>
<div class="container">
<h1>{{ .Values.welcomePage.title }}</h1>
<p class="info">{{ .Values.welcomePage.message }}</p>
<p class="info">
<span class="badge">Release: {{ .Release.Name }}</span>
<span class="badge">Namespace: {{ .Release.Namespace }}</span>
</p>
<p class="info">
<span class="badge">Chart: {{ .Chart.Name }}-{{ .Chart.Version }}</span>
<span class="badge">App: {{ .Chart.AppVersion }}</span>
</p>
</div>
</body>
</html>
TEMPLATE
success "ConfigMap template created"
# ββ 7. Update the Deployment to mount the ConfigMap ββ
cat > "${CHART_NAME}/templates/deployment.yaml" << 'TEMPLATE'
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-web-app.fullname" . }}
labels:
{{- include "my-web-app.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "my-web-app.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "my-web-app.labels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "my-web-app.serviceAccountName" . }}
{{- with .Values.podSecurityContext }}
securityContext:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
{{- with .Values.securityContext }}
securityContext:
{{- toYaml . | nindent 12 }}
{{- end }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 80
protocol: TCP
livenessProbe:
httpGet:
path: /
port: http
readinessProbe:
httpGet:
path: /
port: http
{{- with .Values.resources }}
resources:
{{- toYaml . | nindent 12 }}
{{- end }}
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
readOnly: true
{{- with .Values.volumeMounts }}
{{- toYaml . | nindent 12 }}
{{- end }}
volumes:
- name: html
configMap:
name: {{ include "my-web-app.fullname" . }}-html
{{- with .Values.volumes }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
TEMPLATE
success "Deployment template updated with ConfigMap volume mount"
# ββ 8. Validate the chart ββ
header "Validating Helm Chart"
helm lint "${CHART_NAME}/"
helm template test-release "${CHART_NAME}/" > /dev/null
success "Chart passes lint and template rendering"
# ββ 9. Commit and push to the bare repository ββ
header "Committing to Git Repository"
git add .
git commit -m "Add ${CHART_NAME} Helm chart for GitOps deployment"
git push origin master 2>/dev/null || git push origin main
success "Chart pushed to Git repository"
# ββ 10. Verify the repository contents ββ
header "Repository Contents"
echo ""
find "${CHART_NAME}" -type f | sort | while read -r f; do
echo " ${f}"
done
echo ""
info "Bare repo: ${BARE_REPO}"
info "Working dir: ${WORK_DIR}"
info "Chart path: ${CHART_NAME}/"
success "Step 04 complete!"
Repository Structure:¶
helm-apps/
βββ my-web-app/
βββ Chart.yaml # Chart metadata
βββ values.yaml # Default values
βββ charts/ # Dependencies (empty)
βββ templates/
β βββ _helpers.tpl # Named templates
β βββ configmap.yaml # Custom HTML welcome page
β βββ deployment.yaml # Deployment with ConfigMap mount
β βββ service.yaml # ClusterIP service
β βββ serviceaccount.yaml # Service account
β βββ hpa.yaml # HPA (disabled by default)
β βββ NOTES.txt # Post-install notes
β βββ tests/
β βββ test-connection.yaml
βββ .helmignore
05. Deploy ArgoCD (Offline Install Using Harbor)¶
Deploy ArgoCD using only images from the Harbor registry - a fully airgapped installation.
Scenario:¶
- The cluster has no internet access (simulated by overriding all image references).
- All ArgoCD images are served from
harbor.local/argocd/. - The Helm chart is installed from the locally saved
.tgzfile (not fetched from the internet). - ArgoCD is exposed via Ingress on
argocd.local.
Explanation:¶
- The Helm
--setflags override every image reference to point at Harbor instead of the public registries (quay.io, ghcr.io, docker.io). global.image.repositoryoverrides the main ArgoCD image for all components.- Individual overrides are needed for Redis and Dex since they use different base images.
- The
--set server.insecure=trueflag disables TLS on the ArgoCD server (TLS is terminated at the Ingress level).
Hint: helm install, --set global.image.repository, --set redis.image.repository
Solution
#!/bin/bash
# =============================================================================
# Step 05 - Deploy ArgoCD (Offline Install Using Harbor)
# =============================================================================
set -e
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
BLUE='\033[0;34m'; CYAN='\033[0;36m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
header() { echo -e "\n${CYAN}=== $* ===${NC}"; }
HARBOR_URL="harbor.local"
ARGOCD_CHART="/tmp/argocd-airgap/argo-cd-7.7.12.tgz"
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
# ββ 1. Verify the local Helm chart exists ββ
header "Verifying Local ArgoCD Helm Chart"
if [ ! -f "${ARGOCD_CHART}" ]; then
warn "Local chart not found at ${ARGOCD_CHART}"
info "Falling back to Helm repository (ensure argo repo is added)"
ARGOCD_CHART="argo/argo-cd"
CHART_VERSION_FLAG="--version 7.7.12"
else
CHART_VERSION_FLAG=""
success "Found local chart: ${ARGOCD_CHART}"
fi
# ββ 2. Create ArgoCD values file for offline install ββ
header "Creating Airgap Values File"
cat > /tmp/argocd-airgap-values.yaml << EOF
# =============================================================================
# ArgoCD Airgap Values - All images from Harbor (${HARBOR_URL})
# =============================================================================
global:
image:
repository: ${HARBOR_URL}/argocd/argocd
tag: "v2.13.3"
server:
insecure: true
ingress:
enabled: true
ingressClassName: nginx
hostname: argocd.local
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
redis:
image:
repository: ${HARBOR_URL}/argocd/redis
tag: "7.4.2-alpine"
dex:
image:
repository: ${HARBOR_URL}/argocd/dex
tag: "v2.41.1"
EOF
info "Values file created at /tmp/argocd-airgap-values.yaml"
echo ""
cat /tmp/argocd-airgap-values.yaml
echo ""
# ββ 3. Install ArgoCD with airgap values ββ
header "Installing ArgoCD (Offline / Airgap Mode)"
helm upgrade --install argocd ${ARGOCD_CHART} \
${CHART_VERSION_FLAG} \
--namespace argocd \
--create-namespace \
-f /tmp/argocd-airgap-values.yaml \
--wait --timeout 10m
success "ArgoCD installed in airgap mode"
# ββ 4. Verify all pods are running and using Harbor images ββ
header "Verifying ArgoCD Pods"
kubectl get pods -n argocd
echo ""
info "Container images in use:"
kubectl get pods -n argocd -o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}{end}{end}' | sort -u
# Verify all images come from Harbor
NON_HARBOR=$(kubectl get pods -n argocd \
-o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}{end}{end}' | \
grep -v "${HARBOR_URL}" || true)
if [ -z "${NON_HARBOR}" ]; then
success "All images are served from Harbor (${HARBOR_URL})"
else
warn "Some images are NOT from Harbor:"
echo "${NON_HARBOR}"
fi
# ββ 5. Configure argocd.local in /etc/hosts ββ
header "Configuring argocd.local"
if ! grep -q "argocd.local" /etc/hosts; then
echo "${NODE_IP} argocd.local" | sudo tee -a /etc/hosts
success "Added argocd.local to /etc/hosts"
else
warn "argocd.local already in /etc/hosts"
fi
# ββ 6. Verify Ingress ββ
header "Verifying ArgoCD Ingress"
kubectl get ingress -n argocd
sleep 5
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://argocd.local 2>/dev/null || echo "000")
info "ArgoCD UI HTTP response: ${HTTP_CODE}"
# ββ 7. Retrieve admin password ββ
header "ArgoCD Admin Credentials"
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d)
echo ""
info "ArgoCD URL: http://argocd.local"
info "Username: admin"
info "Password: ${ARGOCD_PASSWORD}"
# ββ 8. Login with ArgoCD CLI (if installed) ββ
header "ArgoCD CLI Login"
if command -v argocd &> /dev/null; then
argocd login argocd.local \
--username admin \
--password "${ARGOCD_PASSWORD}" \
--insecure && \
success "ArgoCD CLI login successful" || \
warn "CLI login failed - try: argocd login argocd.local --insecure"
else
info "ArgoCD CLI not installed. Install with:"
info " brew install argocd (macOS)"
info " curl -sSL -o /usr/local/bin/argocd https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64"
fi
echo ""
success "Step 05 complete! ArgoCD is running in full airgap mode."
Verification Commands:¶
# Check all pods are Running
kubectl get pods -n argocd -o wide
# Confirm images come from Harbor
kubectl get pods -n argocd -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{range .spec.containers[*]}{.image}{"\n"}{end}{end}'
# Expected output (all images from harbor.local):
# argocd-application-controller-0 harbor.local/argocd/argocd:v2.13.3
# argocd-dex-server-xxxx harbor.local/argocd/dex:v2.41.1
# argocd-redis-xxxx harbor.local/argocd/redis:7.4.2-alpine
# argocd-repo-server-xxxx harbor.local/argocd/argocd:v2.13.3
# argocd-server-xxxx harbor.local/argocd/argocd:v2.13.3
# Test the UI
curl -s -o /dev/null -w "%{http_code}" http://argocd.local
# Expected: 200
06. Create an ArgoCD Application to Deploy the Helm Chart¶
Create an ArgoCD Application manifest that points to the Git repository and deploys the Helm chart with automated sync.
Scenario:¶
- The Git repository (from Step 04) contains a Helm chart.
- ArgoCD should watch this repository, render the Helm chart, and deploy it to the cluster.
- Auto-sync with self-heal ensures the cluster always matches the Git state.
- Any change pushed to Git is automatically deployed.
Explanation:¶
- An ArgoCD Application is a Custom Resource (CR) that defines: which Git repo to watch, which path contains the manifests, and where to deploy them.
- Setting
syncPolicy.automatedenables auto-sync - ArgoCD polls Git and applies changes without manual intervention. selfHeal: truereverts any manual cluster changes back to the Git-defined state.prune: truedeletes resources removed from Git.- For Helm charts, ArgoCD auto-detects
Chart.yamland useshelm templateto render manifests.
Hint: argocd app create, kubectl apply -f application.yaml, argocd app sync
Solution
#!/bin/bash
# =============================================================================
# Step 06 - Create an ArgoCD Application to Deploy the Helm Chart
# =============================================================================
set -e
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
BLUE='\033[0;34m'; CYAN='\033[0;36m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
header() { echo -e "\n${CYAN}=== $* ===${NC}"; }
BARE_REPO="/tmp/gitops-lab/helm-apps.git"
APP_NAME="my-web-app"
APP_NAMESPACE="my-web-app"
# ββ 1. Register the Git repository with ArgoCD ββ
header "Registering Git Repository with ArgoCD"
# For a local bare repo, ArgoCD needs the path to be accessible from inside the cluster.
# Option A: Use a Git server (Gitea, GitLab)
# Option B: Mount the bare repo as a volume (for local testing)
# Option C: Use argocd-repo-server to serve local repos
# For this lab, we'll use the repo-server to access local repos
# by copying the bare repo to a PVC or using a ConfigMap.
# Simplest approach: patch the repo-server to mount the host path.
info "For production: use a Git server (GitHub, GitLab, Gitea)."
info "For this lab: we configure ArgoCD to use a local repo path."
# Create the Application manifest
header "Creating ArgoCD Application Manifest"
cat > /tmp/argocd-app-my-web-app.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-web-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
# ββ Replace with your actual Git repository URL ββ
# For GitHub/GitLab:
# repoURL: https://github.com/<your-org>/helm-apps.git
# For local Gitea:
# repoURL: http://gitea.local:3000/<user>/helm-apps.git
repoURL: https://github.com/<your-org>/helm-apps.git
targetRevision: HEAD
path: my-web-app
# Helm-specific configuration
helm:
# Override values for this specific deployment
valuesObject:
replicaCount: 3
welcomePage:
title: "Airgap GitOps App"
message: "Deployed by ArgoCD from Harbor registry!"
backgroundColor: "#0f3460"
textColor: "#e94560"
destination:
server: https://kubernetes.default.svc
namespace: my-web-app
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Revert manual cluster changes
syncOptions:
- CreateNamespace=true
- ApplyOutOfSyncOnly=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
EOF
success "Application manifest created"
echo ""
cat /tmp/argocd-app-my-web-app.yaml
echo ""
# ββ 2. Apply the Application manifest ββ
header "Deploying ArgoCD Application"
kubectl apply -f /tmp/argocd-app-my-web-app.yaml
success "Application created in ArgoCD"
# ββ 3. Wait for the application to sync ββ
header "Waiting for Application Sync"
info "Waiting for ArgoCD to sync the application..."
if command -v argocd &> /dev/null; then
argocd app wait "${APP_NAME}" --health --sync --timeout 120 && \
success "Application is Synced and Healthy" || \
warn "Sync is still in progress - check the ArgoCD UI"
else
# Wait using kubectl
for i in $(seq 1 30); do
HEALTH=$(kubectl get application "${APP_NAME}" -n argocd \
-o jsonpath='{.status.health.status}' 2>/dev/null || echo "Unknown")
SYNC=$(kubectl get application "${APP_NAME}" -n argocd \
-o jsonpath='{.status.sync.status}' 2>/dev/null || echo "Unknown")
info "Attempt ${i}/30 - Health: ${HEALTH}, Sync: ${SYNC}"
if [ "${HEALTH}" = "Healthy" ] && [ "${SYNC}" = "Synced" ]; then
success "Application is Synced and Healthy!"
break
fi
sleep 5
done
fi
# ββ 4. Verify the deployed resources ββ
header "Verifying Deployed Resources"
kubectl get all -n "${APP_NAMESPACE}"
echo ""
info "Pods:"
kubectl get pods -n "${APP_NAMESPACE}" -o wide
echo ""
info "Services:"
kubectl get svc -n "${APP_NAMESPACE}"
# ββ 5. Verify via ArgoCD ββ
header "ArgoCD Application Status"
if command -v argocd &> /dev/null; then
argocd app get "${APP_NAME}"
else
kubectl get application "${APP_NAME}" -n argocd \
-o jsonpath='{.status.sync.status}' && echo ""
kubectl get application "${APP_NAME}" -n argocd \
-o jsonpath='{.status.health.status}' && echo ""
fi
# ββ 6. Access the application ββ
header "Accessing the Application"
info "Port-forward to access the app:"
info " kubectl port-forward svc/${APP_NAME} -n ${APP_NAMESPACE} 8081:80"
info " open http://localhost:8081"
echo ""
info "Or create an Ingress for http://my-web-app.local"
# ββ 7. Test GitOps - push a change and watch auto-sync ββ
header "Testing GitOps Workflow"
info "To test auto-sync, modify the Helm chart in Git:"
info ""
info " cd /tmp/gitops-lab/helm-apps-workspace"
info " # Edit my-web-app/values.yaml (change replicaCount to 5)"
info " git add . && git commit -m 'Scale to 5 replicas' && git push"
info ""
info "ArgoCD will detect the change and automatically sync within ~3 minutes."
info "Or trigger manually: argocd app sync ${APP_NAME}"
echo ""
success "Step 06 complete! GitOps pipeline is fully operational."
Alternative: Create the Application via CLI¶
# Using the ArgoCD CLI instead of a manifest file
argocd app create my-web-app \
--repo https://github.com/<your-org>/helm-apps.git \
--path my-web-app \
--dest-server https://kubernetes.default.svc \
--dest-namespace my-web-app \
--sync-policy automated \
--auto-prune \
--self-heal \
--sync-option CreateNamespace=true \
--helm-set replicaCount=3 \
--helm-set welcomePage.title="Airgap GitOps App"
# Sync and wait
argocd app sync my-web-app
argocd app wait my-web-app --health --timeout 120
# Verify
argocd app get my-web-app
kubectl get all -n my-web-app
Application Lifecycle Diagram:¶
graph LR
subgraph Developer
DEV[git push]
end
subgraph Git_Repo [Git Repo]
REPO[helm-apps/\nmy-web-app/]
end
subgraph ArgoCD [ArgoCD airgap mode]
ARGO[Detects change\nRenders Helm\nApplies to K8s]
end
subgraph Kubernetes
K8S[Namespace: my-web-app\nDeployment (3)\nService (CIP)\nConfigMap (HTML)\nServiceAccount]
end
DEV --> REPO
ARGO -- poll ~3min --> REPO
ARGO --> K8S
Full Install Script (All Steps)¶
A single script that runs all six steps end-to-end. Each step is modular and can be run independently or as part of this combined installer.
Solution
#!/bin/bash
# =============================================================================
# Harbor + ArgoCD Airgap Full Installer
# Runs all 6 steps: Ingress, Harbor, Image Mirror, Git Repo, ArgoCD, App
# =============================================================================
set -e
# ββ Configuration ββ
HARBOR_URL="harbor.local"
HARBOR_USER="admin"
HARBOR_PASS="Harbor12345"
ARGOCD_CHART_VERSION="7.7.12"
REPO_BASE="/tmp/gitops-lab"
BARE_REPO="${REPO_BASE}/helm-apps.git"
WORK_DIR="${REPO_BASE}/helm-apps-workspace"
CHART_NAME="my-web-app"
# ββ Color definitions ββ
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
BLUE='\033[0;34m'; CYAN='\033[0;36m'; BOLD='\033[1m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
success() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
header() { echo -e "\n${CYAN}=== $* ===${NC}"; }
banner() { echo -e "\n${BOLD}${CYAN}ββββββββββββββββββββββββββββββββββββββββββββββββββββ${NC}"; \
echo -e "${BOLD}${CYAN}β $*${NC}"; \
echo -e "${BOLD}${CYAN}ββββββββββββββββββββββββββββββββββββββββββββββββββββ${NC}"; }
wait_for_pods() {
local namespace=$1
local timeout=${2:-300}
local start=$(date +%s)
info "Waiting for all pods in ${namespace} to be Ready (timeout: ${timeout}s)..."
while true; do
local not_ready=$(kubectl get pods -n "${namespace}" --no-headers 2>/dev/null | \
grep -v "Running\|Completed" | wc -l | tr -d ' ')
if [ "${not_ready}" = "0" ] && [ "$(kubectl get pods -n ${namespace} --no-headers 2>/dev/null | wc -l | tr -d ' ')" -gt 0 ]; then
success "All pods in ${namespace} are Ready"
return 0
fi
local elapsed=$(( $(date +%s) - start ))
if [ ${elapsed} -ge ${timeout} ]; then
warn "Timeout waiting for pods in ${namespace}"
kubectl get pods -n "${namespace}"
return 1
fi
sleep 5
done
}
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
banner "STEP 1/6: Install Nginx Ingress Controller + Harbor"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
header "Adding Helm Repositories"
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx 2>/dev/null || true
helm repo add harbor https://helm.goharbor.io 2>/dev/null || true
helm repo add argo https://argoproj.github.io/argo-helm 2>/dev/null || true
helm repo update
header "Installing Nginx Ingress Controller"
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=NodePort \
--set controller.service.nodePorts.http=30080 \
--set controller.service.nodePorts.https=30443 \
--set controller.admissionWebhooks.enabled=false \
--wait --timeout 5m
success "Nginx Ingress Controller installed"
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
info "Node IP: ${NODE_IP}"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
banner "STEP 2/6: Install and Configure Harbor (harbor.local)"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
header "Installing Harbor"
helm upgrade --install harbor harbor/harbor \
--namespace harbor \
--create-namespace \
--set expose.type=ingress \
--set expose.ingress.className=nginx \
--set expose.ingress.hosts.core=harbor.local \
--set expose.tls.enabled=false \
--set externalURL=http://harbor.local \
--set harborAdminPassword="${HARBOR_PASS}" \
--set persistence.enabled=false \
--wait --timeout 10m
success "Harbor installed"
# Configure /etc/hosts
if ! grep -q "harbor.local" /etc/hosts; then
echo "${NODE_IP} harbor.local" | sudo tee -a /etc/hosts
fi
if ! grep -q "argocd.local" /etc/hosts; then
echo "${NODE_IP} argocd.local" | sudo tee -a /etc/hosts
fi
# Wait for Harbor to be healthy
header "Waiting for Harbor Health"
for i in $(seq 1 30); do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://harbor.local/api/v2.0/health 2>/dev/null || echo "000")
if [ "${HTTP_CODE}" = "200" ]; then
success "Harbor is healthy"
break
fi
info "Attempt ${i}/30 - HTTP ${HTTP_CODE}"
sleep 10
done
# Create Harbor projects
header "Creating Harbor Projects"
for project in argocd library; do
response=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST "http://harbor.local/api/v2.0/projects" \
-H "Content-Type: application/json" \
-u "${HARBOR_USER}:${HARBOR_PASS}" \
-d "{\"project_name\": \"${project}\", \"public\": true}")
if [ "${response}" = "201" ] || [ "${response}" = "409" ]; then
success "Project ${project} ready"
else
warn "Project ${project} returned HTTP ${response}"
fi
done
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
banner "STEP 3/6: Mirror ArgoCD Images to Harbor"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
header "Discovering Required Images"
IMAGES=$(helm template argocd argo/argo-cd \
--version "${ARGOCD_CHART_VERSION}" \
--namespace argocd 2>/dev/null | \
grep -E "image:" | \
sed 's/.*image: *"\?\([^"]*\)"\?.*/\1/' | \
sort -u)
info "Images to mirror:"
echo "${IMAGES}"
header "Pulling, Tagging, and Pushing Images"
docker login "${HARBOR_URL}" -u "${HARBOR_USER}" -p "${HARBOR_PASS}" 2>/dev/null || \
warn "Docker login failed - configure insecure registries first"
while IFS= read -r img; do
[ -z "${img}" ] && continue
local_name=$(echo "${img}" | rev | cut -d'/' -f1 | rev)
target="${HARBOR_URL}/argocd/${local_name}"
info "Mirroring: ${img} β ${target}"
docker pull "${img}" 2>/dev/null && \
docker tag "${img}" "${target}" && \
docker push "${target}" 2>/dev/null && \
success "Mirrored: ${target}" || \
warn "Failed to mirror: ${img}"
done <<< "${IMAGES}"
# Save Helm chart locally
header "Saving ArgoCD Helm Chart Locally"
mkdir -p /tmp/argocd-airgap
helm pull argo/argo-cd --version "${ARGOCD_CHART_VERSION}" --destination /tmp/argocd-airgap/ 2>/dev/null || true
success "Chart saved"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
banner "STEP 4/6: Create Git Repository with Helm Chart"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
header "Setting Up Git Repository"
rm -rf "${REPO_BASE}"
mkdir -p "${REPO_BASE}"
git init --bare "${BARE_REPO}"
git clone "${BARE_REPO}" "${WORK_DIR}"
cd "${WORK_DIR}"
helm create "${CHART_NAME}"
# Customize Chart.yaml
cat > "${CHART_NAME}/Chart.yaml" << 'EOF'
apiVersion: v2
name: my-web-app
description: A simple web application deployed via ArgoCD GitOps
type: application
version: 0.1.0
appVersion: "1.25.0"
EOF
# Customize values.yaml
cat > "${CHART_NAME}/values.yaml" << 'EOF'
replicaCount: 2
image:
repository: nginx
pullPolicy: IfNotPresent
tag: "1.25-alpine"
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
automount: true
annotations: {}
name: ""
podAnnotations: {}
podLabels: {}
podSecurityContext: {}
securityContext: {}
service:
type: ClusterIP
port: 80
ingress:
enabled: false
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
autoscaling:
enabled: false
volumes: []
volumeMounts: []
nodeSelector: {}
tolerations: []
affinity: {}
welcomePage:
title: "GitOps Demo App"
message: "Deployed by ArgoCD from Harbor airgap registry!"
backgroundColor: "#1a1a2e"
textColor: "#e94560"
EOF
# Create ConfigMap template
cat > "${CHART_NAME}/templates/configmap.yaml" << 'TEMPLATE'
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "my-web-app.fullname" . }}-html
labels:
{{- include "my-web-app.labels" . | nindent 4 }}
data:
index.html: |
<!DOCTYPE html>
<html>
<head><title>{{ .Values.welcomePage.title }}</title>
<style>
body { font-family: Arial, sans-serif; display: flex; justify-content: center;
align-items: center; min-height: 100vh; margin: 0;
background: {{ .Values.welcomePage.backgroundColor }};
color: {{ .Values.welcomePage.textColor }}; }
.container { text-align: center; }
h1 { font-size: 2.5em; }
.info { font-size: 1.2em; margin: 8px 0; color: #eee; }
.badge { display: inline-block; background: {{ .Values.welcomePage.textColor }};
color: white; padding: 5px 15px; border-radius: 20px; margin: 5px; }
</style></head>
<body><div class="container">
<h1>{{ .Values.welcomePage.title }}</h1>
<p class="info">{{ .Values.welcomePage.message }}</p>
<p class="info">
<span class="badge">Release: {{ .Release.Name }}</span>
<span class="badge">Namespace: {{ .Release.Namespace }}</span>
<span class="badge">Chart: {{ .Chart.Name }}-{{ .Chart.Version }}</span>
</p>
</div></body></html>
TEMPLATE
# Update Deployment to mount ConfigMap
cat > "${CHART_NAME}/templates/deployment.yaml" << 'TEMPLATE'
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-web-app.fullname" . }}
labels:
{{- include "my-web-app.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "my-web-app.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "my-web-app.labels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "my-web-app.serviceAccountName" . }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 80
protocol: TCP
livenessProbe:
httpGet:
path: /
port: http
readinessProbe:
httpGet:
path: /
port: http
{{- with .Values.resources }}
resources:
{{- toYaml . | nindent 12 }}
{{- end }}
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
readOnly: true
volumes:
- name: html
configMap:
name: {{ include "my-web-app.fullname" . }}-html
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
TEMPLATE
# Validate and push
helm lint "${CHART_NAME}/"
git add .
git commit -m "Add my-web-app Helm chart"
git push origin master 2>/dev/null || git push origin main
success "Git repository ready"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
banner "STEP 5/6: Deploy ArgoCD (Offline/Airgap Mode)"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
header "Installing ArgoCD from Harbor"
ARGOCD_CHART_FILE="/tmp/argocd-airgap/argo-cd-${ARGOCD_CHART_VERSION}.tgz"
if [ -f "${ARGOCD_CHART_FILE}" ]; then
CHART_SRC="${ARGOCD_CHART_FILE}"
else
CHART_SRC="argo/argo-cd"
fi
cat > /tmp/argocd-airgap-values.yaml << EOF
global:
image:
repository: ${HARBOR_URL}/argocd/argocd
tag: "v2.13.3"
server:
insecure: true
ingress:
enabled: true
ingressClassName: nginx
hostname: argocd.local
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
redis:
image:
repository: ${HARBOR_URL}/argocd/redis
tag: "7.4.2-alpine"
dex:
image:
repository: ${HARBOR_URL}/argocd/dex
tag: "v2.41.1"
EOF
helm upgrade --install argocd ${CHART_SRC} \
--version "${ARGOCD_CHART_VERSION}" \
--namespace argocd \
--create-namespace \
-f /tmp/argocd-airgap-values.yaml \
--wait --timeout 10m
success "ArgoCD installed in airgap mode"
# Verify images
info "Container images in use:"
kubectl get pods -n argocd -o jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}{end}{end}' | sort -u
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
banner "STEP 6/6: Create ArgoCD Application"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
header "Creating ArgoCD Application for my-web-app"
cat > /tmp/argocd-app-my-web-app.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-web-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/<your-org>/helm-apps.git
targetRevision: HEAD
path: my-web-app
helm:
valuesObject:
replicaCount: 3
welcomePage:
title: "Airgap GitOps App"
message: "Deployed by ArgoCD from Harbor registry!"
destination:
server: https://kubernetes.default.svc
namespace: my-web-app
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
kubectl apply -f /tmp/argocd-app-my-web-app.yaml
success "ArgoCD Application created"
# Wait for sync
header "Waiting for Application to Sync"
for i in $(seq 1 30); do
HEALTH=$(kubectl get application my-web-app -n argocd \
-o jsonpath='{.status.health.status}' 2>/dev/null || echo "Unknown")
SYNC=$(kubectl get application my-web-app -n argocd \
-o jsonpath='{.status.sync.status}' 2>/dev/null || echo "Unknown")
if [ "${HEALTH}" = "Healthy" ] && [ "${SYNC}" = "Synced" ]; then
success "Application is Synced and Healthy!"
break
fi
info "Attempt ${i}/30 - Health: ${HEALTH}, Sync: ${SYNC}"
sleep 5
done
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
banner "INSTALLATION COMPLETE"
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
echo ""
echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
echo ""
info "Harbor Registry:"
info " URL: http://harbor.local"
info " Username: ${HARBOR_USER}"
info " Password: ${HARBOR_PASS}"
echo ""
info "ArgoCD:"
info " URL: http://argocd.local"
info " Username: admin"
info " Password: ${ARGOCD_PASSWORD}"
echo ""
info "Git Repository:"
info " Bare: ${BARE_REPO}"
info " Workspace: ${WORK_DIR}"
echo ""
info "Application:"
info " Name: my-web-app"
info " Namespace: my-web-app"
info " Access: kubectl port-forward svc/my-web-app -n my-web-app 8081:80"
echo ""
echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"
success "All 6 steps completed successfully!"
Quick Reference: Key Commands¶
| Task | Command |
|---|---|
| Check Harbor health | curl http://harbor.local/api/v2.0/health |
| List Harbor projects | curl -u admin:Harbor12345 http://harbor.local/api/v2.0/projects |
| Docker login to Harbor | docker login harbor.local -u admin -p Harbor12345 |
| Mirror an image to Harbor | docker pull IMG && docker tag IMG harbor.local/proj/IMG && docker push |
| Discover ArgoCD images | helm template argocd argo/argo-cd \| grep image: \| sort -u |
| ArgoCD admin password | kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" \| base64 -d |
| ArgoCD CLI login | argocd login argocd.local --insecure --username admin --password PASS |
| Create ArgoCD app | kubectl apply -f application.yaml |
| Sync ArgoCD app | argocd app sync my-web-app |
| Check app health | argocd app get my-web-app |
| Verify airgap images | kubectl get pods -n argocd -o jsonpath='{..image}' \| tr ' ' '\n' \| sort -u |
Cleanup¶
# Remove ArgoCD application
argocd app delete my-web-app --yes 2>/dev/null || \
kubectl delete application my-web-app -n argocd
# Uninstall ArgoCD
helm uninstall argocd -n argocd
kubectl delete namespace argocd
# Uninstall Harbor
helm uninstall harbor -n harbor
kubectl delete namespace harbor
# Uninstall Ingress Controller
helm uninstall ingress-nginx -n ingress-nginx
kubectl delete namespace ingress-nginx
# Remove namespaces
kubectl delete namespace my-web-app 2>/dev/null || true
# Clean up local files
rm -rf /tmp/gitops-lab /tmp/argocd-airgap /tmp/argocd-airgap-values.yaml /tmp/argocd-app-my-web-app.yaml
# Remove /etc/hosts entries
sudo sed -i '' '/harbor.local/d' /etc/hosts
sudo sed -i '' '/argocd.local/d' /etc/hosts